On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya <kier...@obe.tv> wrote:
>      SPLATB_LOAD m0, r0+r1*0-1, m2
>      SPLATB_LOAD m1, r0+r1*1-1, m2

This adds an extra unnecessary shuffle in the SSE2 code as it splats
to a full register. The easiest way of fixing it would probably be to
unroll the macro and manually get rid of it.

Although on x86-64 it might be faster to do a 1->8 byte splat using a
GPR multiply with 0x0101010101010101.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to