On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya <kier...@obe.tv> wrote: > SPLATB_LOAD m0, r0+r1*0-1, m2 > SPLATB_LOAD m1, r0+r1*1-1, m2
This adds an extra unnecessary shuffle in the SSE2 code as it splats to a full register. The easiest way of fixing it would probably be to unroll the macro and manually get rid of it. Although on x86-64 it might be faster to do a 1->8 byte splat using a GPR multiply with 0x0101010101010101. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".