2012/12/22 Christophe Gisquet <[email protected]>:
> However, I'll postpone that patch until I can get it to run as fast as
> the IEEE754 version... Yes I'm not kidding, even after further
> unrolling of the sse2 and fixing that IEEE754 function, the later is
> faster on Win64/Arrandale.

So here's the latest version. I got it down for Win64/Arrandale to 70
cycles, but that the same timing (if a bit higher) than the IEEE754
version.

This is due to that later function getting fully unrolled. The current
SSE loop runs twice IIRC, and given its length, it may be OK to also
completely unroll it.

-- 
Christophe

Attachment: 0003-SBR-DSP-x86-implement-SSE-qmf_pre_shuffle.patch
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to