Hi,

On Tue, May 1, 2012 at 1:49 PM, Justin Ruggles <[email protected]> wrote:
> +.loop:
> +    mulps     m0, m4, [srcq+2*lenq         ]
> +    mulps     m1, m4, [srcq+2*lenq+1*mmsize]
> +    mulps     m2, m4, [srcq+2*lenq+2*mmsize]
> +    mulps     m3, m4, [srcq+2*lenq+3*mmsize]
> +    cvtps2dq  m0, m0
> +    cvtps2dq  m1, m1
> +    cvtps2dq  m2, m2
> +    cvtps2dq  m3, m3

Is this (load+mul in same instruction) actually faster than load x4,
followed by mul x4? The load latency may make this slower, even though
it's less instructions.

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to