Hi,

Note:
- I have tried SSE4 (and pextrd for the remaining cases) with a
testcase on fewer elements (eg mpc7), but that was not faster
- Unrolling SSSE3 only up to 4 is slower
- Unrolling up to 16 doesn't change things

I'm not sure it's worth writing a MMX2 version. Maybe building several
versions to take advantage of palignr could help, but I don't really
want to bother with that much code increase.

Note: all the mingw versions of gcc I tested (official 4.5.2, TDM's
4.5.2 and 4.6.1) miscompile av_bswap32 as a series of ror. No option
passed to configure, no preset cflags.

Best regards,
Christophe

Attachment: 0004-dsputil-provide-SIMD-versions-of-bswap_buf.patch
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to