On Fri, 15 Feb 2013, Daniel Kang wrote:

> +%macro PAVGBP_MMX 6
> +    mova   %3, %1
> +    mova   %6, %4
> +    por    %3, %2
> +    por    %6, %5
> +    pxor   %2, %1
> +    pxor   %5, %4
> +    pand   %2, m6
> +    pand   %5, m6
> +    psrlq  %2, 1
> +    psrlq  %5, 1
> +    psubb  %3, %2
> +    psubb  %6, %5
> +%endmacro
> +
> +%macro PAVGBP_NO_RND_MMX 6
> +    mova         %3, %1
> +    mova         %6, %4
> +    pand         %3, %2
> +    pand         %6, %5
> +    pxor         %2, %1
> +    pxor         %5, %4
> +    pand         %2, m6
> +    pand         %5, m6
> +    psrlq        %2, 1
> +    psrlq        %5, 1
> +    paddb        %3, %2
> +    paddb        %6, %5
> +%endmacro

Does this need to be interleaved, not just two calls to PAVGB_OP_MMX?

> +; put_pixels8_xy2(uint8_t *block, const uint8_t *pixels, int line_size, int 
> h)
> +%macro PUT_PIXELS8_XY2_MMX 0-1
> +cglobal put%1_pixels8_xy2, 4,5
> +    pxor         m7, m7
> +    SET_RND(m6)
> +    mova         m0, [r1]
> +    mova         m4, [r1+1]
> +    mova         m1, m0
> +    mova         m5, m4
> +    punpcklbw    m0, m7
> +    punpcklbw    m4, m7
> +    punpckhbw    m1, m7
> +    punpckhbw    m5, m7
> +    paddusw      m4, m0
> +    paddusw      m5, m1
> +    xor          r4, r4
> +    add          r1, r2
> +.loop:
> +    mova         m0, [r1+r4]
> +    mova         m2, [r1+r4+1]
> +    mova         m1, m0
> +    mova         m3, m2
> +    punpcklbw    m0, m7
> +    punpcklbw    m2, m7
> +    punpckhbw    m1, m7
> +    punpckhbw    m3, m7
> +    paddusw      m0, m2
> +    paddusw      m1, m3
> +    paddusw      m4, m6
> +    paddusw      m5, m6
> +    paddusw      m4, m0
> +    paddusw      m5, m1
> +    psrlw        m4, 2
> +    psrlw        m5, 2
> +    packuswb     m4, m5
> +    mova    [r0+r4], m4
> +    add          r4, r2
> +    mova         m2, [r1+r4]
> +    mova         m3, [r1+r4+1]
> +    mova         m3, m2
> +    mova         m5, m4
> +    punpcklbw    m2, m7
> +    punpcklbw    m4, m7
> +    punpckhbw    m3, m7
> +    punpckhbw    m5, m7
> +    paddusw      m4, m2
> +    paddusw      m5, m3
> +    paddusw      m0, m6
> +    paddusw      m1, m6
> +    paddusw      m0, m4
> +    paddusw      m1, m5
> +    psrlw        m0, 2
> +    psrlw        m1, 2
> +    packuswb     m0, m1
> +    mova    [r0+r4], m0
> +    add          r4, r2
> +    sub         r3d, 2
> +    jne .loop
> +    RET
> +%endmacro

Does this and similar functions really need to be unrolled? If so, use
%rep.

--Loren Merritt
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to