Here are some extra implementations that extend Christophe's work.

Differences with v1:

* AVX/FMA3: Removed the main loop and related bookkeepeing for x64 since said 
loop 
would be run only once anyway.
* FMA3: Replaced mulps+subps with FMA3 instructions, meaning two less 
instructions 
run per loop in that version.
* Removed some unnecessary preprocessor guards and added some missing ones.

Knowing that currently AMD has lackluster performance with ymm registers I 
could 
add an FMA4 version of this function using xmm registers, which would benefit 
said 
processors unlike the AVX/FMA3 ymm ones. Thoughts?

James Almer (3):
  x86/synth_filter: add synth_filter_sse
  x86/synth_filter: add synth_filter_avx
  x86/synth_filter: add synth_filter_fma3

 libavcodec/x86/dcadsp.asm    | 138 ++++++++++++++++++++++++++++++++-----------
 libavcodec/x86/dcadsp_init.c |  55 +++++++++++------
 2 files changed, 143 insertions(+), 50 deletions(-)

-- 
1.8.3.2

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to