On Thu, Jan 19, 2012 at 1:05 PM, Christophe Gisquet <[email protected]> wrote: > Hi, > > Note: > - I have tried SSE4 (and pextrd for the remaining cases) with a > testcase on fewer elements (eg mpc7), but that was not faster > - Unrolling SSSE3 only up to 4 is slower > - Unrolling up to 16 doesn't change things > > I'm not sure it's worth writing a MMX2 version. Maybe building several > versions to take advantage of palignr could help, but I don't really > want to bother with that much code increase.
Is the SSE2 version faster than using 64-bit bswap on x86_64? If not, it shouldn't be used on 64-bit. Jason _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
