[libav-devel] [PATCH 0/3 v2] synth filter float ASM

James Almer Thu, 20 Mar 2014 11:38:29 -0700

Here are some extra implementations that extend Christophe's work.

Differences with v1:


* AVX/FMA3: Removed the main loop and related bookkeepeing for x64 since said 
loop 
would be run only once anyway.
* FMA3: Replaced mulps+subps with FMA3 instructions, meaning two less 
instructions 
run per loop in that version.
* Removed some unnecessary preprocessor guards and added some missing ones.

Knowing that currently AMD has lackluster performance with ymm registers I 
could 
add an FMA4 version of this function using xmm registers, which would benefit 
said 
processors unlike the AVX/FMA3 ymm ones. Thoughts?

James Almer (3):
  x86/synth_filter: add synth_filter_sse
  x86/synth_filter: add synth_filter_avx
  x86/synth_filter: add synth_filter_fma3

 libavcodec/x86/dcadsp.asm    | 138 ++++++++++++++++++++++++++++++++-----------
 libavcodec/x86/dcadsp_init.c |  55 +++++++++++------
 2 files changed, 143 insertions(+), 50 deletions(-)

-- 
1.8.3.2

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 0/3 v2] synth filter float ASM

Reply via email to