2014-02-06 Janne Grunau <janne-li...@jannau.net>:
> The function is very short so the function call overhead becomes
> significant. 34 vs. 39 cycles on a cortex-a9, i.e. the inline version
> is over 10% faster.

Yes, arm also does the same with reason.

Same overhead (10%?) probably the same for x86_32, a bit less for x86_64.

x86_32 anyways can't assume any instruction set and is thus using yasm
functions called through dsp context.
For x86_64, this results in SSE2 inline being equivalent to SSE4 callee.

I haven't benchmarked the overall improvement, but I doubt it is that
big anyway.

-- 
Christophe
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to