On Mon, 18 Feb 2013, Daniel Kang wrote: > +%macro PAVGBP_MMX 6 > + mova %3, %1 > + mova %6, %4 > + por %3, %2 > + por %6, %5 > + pxor %2, %1 > + pxor %5, %4 > + pand %2, m6 > + pand %5, m6 > + psrlq %2, 1 > + psrlq %5, 1 > + psubb %3, %2 > + psubb %6, %5 > +%endmacro > + > +%macro PAVGB_NRND_OP_MMX 4 > + mova %3, %1 > + pand %3, %2 > + pxor %2, %1 > + pand %2, %4 > + psrlq %2, 1 > + paddb %3, %2 > +%endmacro > + > +%macro PAVGBP_NO_RND_MMX 6 > + PAVGB_NRND_OP_MMX %1, %2, %3, m6 > + PAVGB_NRND_OP_MMX %4, %5, %6, m6 > +%endmacro > + > +%macro PAVGB_OP_MMX 4 > + mova %3, %1 > + por %3, %2 > + pxor %2, %1 > + pand %2, %4 > + psrlq %2, 1 > + psubb %3, %2 > +%endmacro
I meant eliminate PAVGBP_MMX and PAVGBP_NO_RND_MMX entirely, and instead call PAVGB_OP_MMX or PAVGB_NRND_OP_MMX twice from the functions that used to use them. > +; put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, > int h) > +%macro PUT_PIXELS8_X2_MMX 0-1 > +cglobal put%1_pixels8_x2, 4,4 > + pcmpeqd m6, m6 > + paddb m6, m6 > +.loop: > + mova m0, [r1] > + mova m1, [r1+1] > + mova m2, [r1+r2] > + mova m3, [r1+r2+1] > + PAVGBP m0, m1, m4, m2, m3, m5 > + mova [r0], m4 > + mova [r0+r2*1], m5 > + lea r1, [r1+r2*2] > + lea r0, [r0+r2*2] > + mova m0, [r1] > + mova m1, [r1+1] > + mova m2, [r1+r2] > + mova m3, [r1+r2+1] > + PAVGBP m0, m1, m4, m2, m3, m5 > + mova [r0], m4 > + mova [r0+r2*1], m5 > + lea r1, [r1+r2*2] > + lea r0, [r0+r2*2] > + sub r3d, 4 > + jne .loop > + RET > +%endmacro %rep. That said, I would have guessed that the original purpose of the unrolling in most of the functions in this patch was to allow manual register renaming to eliminate a few moves. In which case the functions where %rep works are precisely the ones that shouldn't have been unrolled. --Loren Merritt _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
