Hi, On Wed, Jul 30, 2014 at 5:04 PM, James Almer <jamr...@gmail.com> wrote:
> On 30/07/14 10:33 AM, Pierre Edouard Lepere wrote: > > > +%macro TR_ADD_INIT_SSE_8 2 > > + movu m4, [r1] > > + movu m6, [r1+16] > > + movu m8, [r1+32] > > + movu m10, [r1+48] > > You can use mova here, and probably in every other movu as well. > > > + lea %1, [%2*3] > > + pxor m5, m5 > > + psubw m5, m4 > > + packuswb m4, m4 > > + packuswb m5, m5 > > + pxor m7, m7 > > + psubw m7, m6 > > + packuswb m6, m6 > > + packuswb m7, m7 > > + pxor m9, m9 > > + psubw m9, m8 > > + packuswb m8, m8 > > + packuswb m9, m9 > > + pxor m11, m11 > > + psubw m11, m10 > > + packuswb m10, m10 > > + packuswb m11, m11 > > +%endmacro > > > > +%macro TR_ADD_OP_SSE 4 > > + %1 m0, [%2 ] > > + %1 m1, [%2+%3 ] > > + %1 m2, [%2+%3*2] > > + %1 m3, [%2+%4 ] > > + paddusb m0, m4 > > + paddusb m1, m6 > > + paddusb m2, m8 > > + paddusb m3, m10 > > + psubusb m0, m5 > > + psubusb m1, m7 > > + psubusb m2, m9 > > + psubusb m3, m11 > > + %1 [%2 ], m0 > > + %1 [%2+%3 ], m1 > > + %1 [%2+2*%3], m2 > > + %1 [%2+%4 ], m3 > > +%endmacro > > You can use packuswb to pack two regs into one, like you did in > TR_ADD_INIT_SSE_16. > Then you simply use movq+movhps to load and store data, like so: > > %macro TR_ADD_INIT_SSE_8 2 > mova m4, [r1] > mova m6, [r1+16] > mova m0, [r1+32] > mova m2, [r1+48] > lea %1, [%2*3] > pxor m5, m5 > psubw m5, m4 > pxor m7, m7 > psubw m7, m6 > pxor m1, m1 > psubw m1, m0 > packuswb m4, m0 > packuswb m5, m1 > pxor m3, m3 > psubw m3, m2 > packuswb m6, m2 > packuswb m7, m3 > %endmacro > > %macro TR_ADD_OP_SSE 4 > movq m0, [%2 ] > movq m1, [%2+%3 ] > movhps m0, [%2+%3*2] > movhps m1, [%2+%4 ] > paddusb m0, m4 > paddusb m1, m6 > psubusb m0, m5 > psubusb m1, m7 > movq [%2 ], m0 > movq [%2+%3 ], m1 > movhps [%2+2*%3], m0 > movhps [%2+%4 ], m1 > %endmacro Why all these memory round-trips? Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel