Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

Christophe Gisquet Thu, 06 Feb 2014 09:44:13 -0800

2014-02-06 Janne Grunau <janne-li...@jannau.net>:
> The function is very short so the function call overhead becomes
> significant. 34 vs. 39 cycles on a cortex-a9, i.e. the inline version
> is over 10% faster.


Yes, arm also does the same with reason.

Same overhead (10%?) probably the same for x86_32, a bit less for x86_64.

x86_32 anyways can't assume any instruction set and is thus using yasm
functions called through dsp context.
For x86_64, this results in SSE2 inline being equivalent to SSE4 callee.

I haven't benchmarked the overall improvement, but I doubt it is that
big anyway.

-- 
Christophe
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 02/11] x86: dcadsp: implement int8x8_fmul_int32

Reply via email to