+    mova            m0, [r0+r2*4+mmsize*0]
+    mova            m1, [r0+r2*4+mmsize*1]
+    mova            m2, [r0+r2*4+mmsize*2]
+    mova            m3, [r0+r2*4+mmsize*3]
+    paddd           m0, m4
+    paddd           m1, m4
+    paddd           m2, m4
+    paddd           m3, m4

In AVX, can't this be:

paddd m0, m4, [r0+r2*4+mmsize*0]

or something of the sort?

We might have to ifdef it because mova between regs is an extra uop on
Intel, whereas a load isn't.

Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to