On Fri, Feb 15, 2013 at 6:33 PM, Loren Merritt <[email protected]> wrote:
> On Fri, 15 Feb 2013, Daniel Kang wrote:
>
>> +%macro PAVGBP_MMX 6
>> +    mova   %3, %1
>> +    mova   %6, %4
>> +    por    %3, %2
>> +    por    %6, %5
>> +    pxor   %2, %1
>> +    pxor   %5, %4
>> +    pand   %2, m6
>> +    pand   %5, m6
>> +    psrlq  %2, 1
>> +    psrlq  %5, 1
>> +    psubb  %3, %2
>> +    psubb  %6, %5
>> +%endmacro
>> +
>> +%macro PAVGBP_NO_RND_MMX 6
>> +    mova         %3, %1
>> +    mova         %6, %4
>> +    pand         %3, %2
>> +    pand         %6, %5
>> +    pxor         %2, %1
>> +    pxor         %5, %4
>> +    pand         %2, m6
>> +    pand         %5, m6
>> +    psrlq        %2, 1
>> +    psrlq        %5, 1
>> +    paddb        %3, %2
>> +    paddb        %6, %5
>> +%endmacro
>
> Does this need to be interleaved, not just two calls to PAVGB_OP_MMX?

No, fixed.

>> +; put_pixels8_xy2(uint8_t *block, const uint8_t *pixels, int line_size, int 
>> h)
>> +%macro PUT_PIXELS8_XY2_MMX 0-1
>> +cglobal put%1_pixels8_xy2, 4,5
>> +    pxor         m7, m7
>> +    SET_RND(m6)
>> +    mova         m0, [r1]
>> +    mova         m4, [r1+1]
>> +    mova         m1, m0
>> +    mova         m5, m4
>> +    punpcklbw    m0, m7
>> +    punpcklbw    m4, m7
>> +    punpckhbw    m1, m7
>> +    punpckhbw    m5, m7
>> +    paddusw      m4, m0
>> +    paddusw      m5, m1
>> +    xor          r4, r4
>> +    add          r1, r2
>> +.loop:
>> +    mova         m0, [r1+r4]
>> +    mova         m2, [r1+r4+1]
>> +    mova         m1, m0
>> +    mova         m3, m2
>> +    punpcklbw    m0, m7
>> +    punpcklbw    m2, m7
>> +    punpckhbw    m1, m7
>> +    punpckhbw    m3, m7
>> +    paddusw      m0, m2
>> +    paddusw      m1, m3
>> +    paddusw      m4, m6
>> +    paddusw      m5, m6
>> +    paddusw      m4, m0
>> +    paddusw      m5, m1
>> +    psrlw        m4, 2
>> +    psrlw        m5, 2
>> +    packuswb     m4, m5
>> +    mova    [r0+r4], m4
>> +    add          r4, r2
>> +    mova         m2, [r1+r4]
>> +    mova         m3, [r1+r4+1]
>> +    mova         m3, m2
>> +    mova         m5, m4
>> +    punpcklbw    m2, m7
>> +    punpcklbw    m4, m7
>> +    punpckhbw    m3, m7
>> +    punpckhbw    m5, m7
>> +    paddusw      m4, m2
>> +    paddusw      m5, m3
>> +    paddusw      m0, m6
>> +    paddusw      m1, m6
>> +    paddusw      m0, m4
>> +    paddusw      m1, m5
>> +    psrlw        m0, 2
>> +    psrlw        m1, 2
>> +    packuswb     m0, m1
>> +    mova    [r0+r4], m0
>> +    add          r4, r2
>> +    sub         r3d, 2
>> +    jne .loop
>> +    RET
>> +%endmacro
>
> Does this and similar functions really need to be unrolled? If so, use
> %rep.

Yes, due to the way this is written. I rep'd the one I could.
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to