On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
<martin.vign...@gmail.com> wrote:
> +        vpermq  m1, [srcq + xq -     mmsize + %3], 0x4e; flip each lane at 
> load
> +        vpermq  m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at 
> load

Would doing 2x 128-bit movu + 2x vinserti128 be faster?
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to