Hi, On Tue, May 1, 2012 at 1:49 PM, Justin Ruggles <[email protected]> wrote: > +.loop: > + mulps m0, m4, [srcq+2*lenq ] > + mulps m1, m4, [srcq+2*lenq+1*mmsize] > + mulps m2, m4, [srcq+2*lenq+2*mmsize] > + mulps m3, m4, [srcq+2*lenq+3*mmsize] > + cvtps2dq m0, m0 > + cvtps2dq m1, m1 > + cvtps2dq m2, m2 > + cvtps2dq m3, m3
Is this (load+mul in same instruction) actually faster than load x4, followed by mul x4? The load latency may make this slower, even though it's less instructions. Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
