2012/12/22 Christophe Gisquet <[email protected]>: > However, I'll postpone that patch until I can get it to run as fast as > the IEEE754 version... Yes I'm not kidding, even after further > unrolling of the sse2 and fixing that IEEE754 function, the later is > faster on Win64/Arrandale.
So here's the latest version. I got it down for Win64/Arrandale to 70 cycles, but that the same timing (if a bit higher) than the IEEE754 version. This is due to that later function getting fully unrolled. The current SSE loop runs twice IIRC, and given its length, it may be OK to also completely unroll it. -- Christophe
0003-SBR-DSP-x86-implement-SSE-qmf_pre_shuffle.patch
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
