2014-02-06 Janne Grunau <janne-li...@jannau.net>: > The function is very short so the function call overhead becomes > significant. 34 vs. 39 cycles on a cortex-a9, i.e. the inline version > is over 10% faster.
Yes, arm also does the same with reason. Same overhead (10%?) probably the same for x86_32, a bit less for x86_64. x86_32 anyways can't assume any instruction set and is thus using yasm functions called through dsp context. For x86_64, this results in SSE2 inline being equivalent to SSE4 callee. I haven't benchmarked the overall improvement, but I doubt it is that big anyway. -- Christophe _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel