On Thu, Jan 19, 2012 at 1:05 PM, Christophe Gisquet
<[email protected]> wrote:
> Hi,
>
> Note:
> - I have tried SSE4 (and pextrd for the remaining cases) with a
> testcase on fewer elements (eg mpc7), but that was not faster
> - Unrolling SSSE3 only up to 4 is slower
> - Unrolling up to 16 doesn't change things
>
> I'm not sure it's worth writing a MMX2 version. Maybe building several
> versions to take advantage of palignr could help, but I don't really
> want to bother with that much code increase.

Is the SSE2 version faster than using 64-bit bswap on x86_64?  If not,
it shouldn't be used on 64-bit.

Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to