On Mon, 18 Feb 2013, Daniel Kang wrote:

> +%macro PAVGBP_MMX 6
> +    mova   %3, %1
> +    mova   %6, %4
> +    por    %3, %2
> +    por    %6, %5
> +    pxor   %2, %1
> +    pxor   %5, %4
> +    pand   %2, m6
> +    pand   %5, m6
> +    psrlq  %2, 1
> +    psrlq  %5, 1
> +    psubb  %3, %2
> +    psubb  %6, %5
> +%endmacro
> +
> +%macro PAVGB_NRND_OP_MMX 4
> +    mova   %3, %1
> +    pand   %3, %2
> +    pxor   %2, %1
> +    pand   %2, %4
> +    psrlq  %2, 1
> +    paddb  %3, %2
> +%endmacro
> +
> +%macro PAVGBP_NO_RND_MMX 6
> +    PAVGB_NRND_OP_MMX %1, %2, %3, m6
> +    PAVGB_NRND_OP_MMX %4, %5, %6, m6
> +%endmacro
> +
> +%macro PAVGB_OP_MMX 4
> +    mova         %3, %1
> +    por          %3, %2
> +    pxor         %2, %1
> +    pand         %2, %4
> +    psrlq        %2, 1
> +    psubb        %3, %2
> +%endmacro

I meant eliminate PAVGBP_MMX and PAVGBP_NO_RND_MMX entirely, and instead
call PAVGB_OP_MMX or PAVGB_NRND_OP_MMX twice from the functions that used
to use them.

> +; put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, 
> int h)
> +%macro PUT_PIXELS8_X2_MMX 0-1
> +cglobal put%1_pixels8_x2, 4,4
> +    pcmpeqd      m6, m6
> +    paddb        m6, m6
> +.loop:
> +    mova         m0, [r1]
> +    mova         m1, [r1+1]
> +    mova         m2, [r1+r2]
> +    mova         m3, [r1+r2+1]
> +    PAVGBP       m0, m1, m4, m2, m3, m5
> +    mova       [r0], m4
> +    mova  [r0+r2*1], m5
> +    lea          r1, [r1+r2*2]
> +    lea          r0, [r0+r2*2]
> +    mova         m0, [r1]
> +    mova         m1, [r1+1]
> +    mova         m2, [r1+r2]
> +    mova         m3, [r1+r2+1]
> +    PAVGBP       m0, m1, m4, m2, m3, m5
> +    mova       [r0], m4
> +    mova  [r0+r2*1], m5
> +    lea          r1, [r1+r2*2]
> +    lea          r0, [r0+r2*2]
> +    sub         r3d, 4
> +    jne .loop
> +    RET
> +%endmacro

%rep.
That said, I would have guessed that the original purpose of the
unrolling in most of the functions in this patch was to allow manual
register renaming to eliminate a few moves. In which case the functions
where %rep works are precisely the ones that shouldn't have been
unrolled.

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to