On Fri, 15 Feb 2013, Daniel Kang wrote: > +%macro PAVGBP_MMX 6 > + mova %3, %1 > + mova %6, %4 > + por %3, %2 > + por %6, %5 > + pxor %2, %1 > + pxor %5, %4 > + pand %2, m6 > + pand %5, m6 > + psrlq %2, 1 > + psrlq %5, 1 > + psubb %3, %2 > + psubb %6, %5 > +%endmacro > + > +%macro PAVGBP_NO_RND_MMX 6 > + mova %3, %1 > + mova %6, %4 > + pand %3, %2 > + pand %6, %5 > + pxor %2, %1 > + pxor %5, %4 > + pand %2, m6 > + pand %5, m6 > + psrlq %2, 1 > + psrlq %5, 1 > + paddb %3, %2 > + paddb %6, %5 > +%endmacro
Does this need to be interleaved, not just two calls to PAVGB_OP_MMX? > +; put_pixels8_xy2(uint8_t *block, const uint8_t *pixels, int line_size, int > h) > +%macro PUT_PIXELS8_XY2_MMX 0-1 > +cglobal put%1_pixels8_xy2, 4,5 > + pxor m7, m7 > + SET_RND(m6) > + mova m0, [r1] > + mova m4, [r1+1] > + mova m1, m0 > + mova m5, m4 > + punpcklbw m0, m7 > + punpcklbw m4, m7 > + punpckhbw m1, m7 > + punpckhbw m5, m7 > + paddusw m4, m0 > + paddusw m5, m1 > + xor r4, r4 > + add r1, r2 > +.loop: > + mova m0, [r1+r4] > + mova m2, [r1+r4+1] > + mova m1, m0 > + mova m3, m2 > + punpcklbw m0, m7 > + punpcklbw m2, m7 > + punpckhbw m1, m7 > + punpckhbw m3, m7 > + paddusw m0, m2 > + paddusw m1, m3 > + paddusw m4, m6 > + paddusw m5, m6 > + paddusw m4, m0 > + paddusw m5, m1 > + psrlw m4, 2 > + psrlw m5, 2 > + packuswb m4, m5 > + mova [r0+r4], m4 > + add r4, r2 > + mova m2, [r1+r4] > + mova m3, [r1+r4+1] > + mova m3, m2 > + mova m5, m4 > + punpcklbw m2, m7 > + punpcklbw m4, m7 > + punpckhbw m3, m7 > + punpckhbw m5, m7 > + paddusw m4, m2 > + paddusw m5, m3 > + paddusw m0, m6 > + paddusw m1, m6 > + paddusw m0, m4 > + paddusw m1, m5 > + psrlw m0, 2 > + psrlw m1, 2 > + packuswb m0, m1 > + mova [r0+r4], m0 > + add r4, r2 > + sub r3d, 2 > + jne .loop > + RET > +%endmacro Does this and similar functions really need to be unrolled? If so, use %rep. --Loren Merritt _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
