Hi, Note: - I have tried SSE4 (and pextrd for the remaining cases) with a testcase on fewer elements (eg mpc7), but that was not faster - Unrolling SSSE3 only up to 4 is slower - Unrolling up to 16 doesn't change things
I'm not sure it's worth writing a MMX2 version. Maybe building several versions to take advantage of palignr could help, but I don't really want to bother with that much code increase. Note: all the mingw versions of gcc I tested (official 4.5.2, TDM's 4.5.2 and 4.6.1) miscompile av_bswap32 as a series of ror. No option passed to configure, no preset cflags. Best regards, Christophe
0004-dsputil-provide-SIMD-versions-of-bswap_buf.patch
Description: Binary data
_______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
