+ mova m0, [r0+r2*4+mmsize*0] + mova m1, [r0+r2*4+mmsize*1] + mova m2, [r0+r2*4+mmsize*2] + mova m3, [r0+r2*4+mmsize*3] + paddd m0, m4 + paddd m1, m4 + paddd m2, m4 + paddd m3, m4
In AVX, can't this be: paddd m0, m4, [r0+r2*4+mmsize*0] or something of the sort? We might have to ifdef it because mova between regs is an extra uop on Intel, whereas a load isn't. Jason _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
