Hi,

actual asm now:

On Sat, Jan 5, 2013 at 9:01 AM, Daniel Kang <[email protected]> wrote:
> +%macro CHECK 2
> +    movu      m2, [curq+mrefsq+%1]
> +    movu      m3, [curq+prefsq+%2]
> +    mova      m4, m2
> +    mova      m5, m2
> +    pxor      m4, m3
> +    pavgb     m5, m3

pxor m4, m2, m3
pavgb m5, m2, m3

> +    mova      m4, m2
> +    psubusb   m2, m3
> +    psubusb   m3, m4
> +    pmaxub    m2, m3

psubusb m4, m3, m2
psubusb m2, m3
pmaxub m2, m4

> +    mova      m3, m2
> +    mova      m4, m2
> +%if mmsize == 16
> +    psrldq    m3, 1
> +    psrldq    m4, 2
> +%else
> +    psrlq     m3, 8
> +    psrlq     m4, 16
> +%endif

psrldq m3, m2, 1
psrldq m4, m2, 2

same for the other half.

> +%macro CHECK1 0
> +    mova    m3, m0
> +    pcmpgtw m3, m2

pcmpgtw m3, m0, m2

> +    mova    m6, m3
> +    pand    m5, m3
> +    pandn   m3, m1

I suppose m6 is a backup for elsewhere, so can probably be re-arranged
for AVX also.

> +    mova    m1, m3
> +%endmacro

Same.

> +%macro CHECK2 0
> +    paddw   m6, [pw_1]
> +    psllw   m6, 14
> +    paddsw  m2, m6
> +    mova    m3, m0
> +    pcmpgtw m3, m2

pcmpgtw m3, m0, m2

> +    pminsw  m0, m2
> +    pand    m5, m3
> +    pandn   m3, m1
> +    por     m3, m5
> +    mova    m1, m3
> +%endmacro

Again avx.

> +%macro FILTER 3
> +.loop%1:
> +    pxor        m7, m7
> +    LOAD         0, [curq+mrefsq]
> +    LOAD         1, [curq+prefsq]
> +    LOAD         2, [%2]
> +    LOAD         3, [%3]
> +    mova        m4, m3
> +    paddw       m3, m2
> +    psraw       m3, 1
> +    mova [rsp+ 0], m0
> +    mova [rsp+16], m3
> +    mova [rsp+32], m1

Use m8-15 on x86-64.

> +    LOAD        3, [prevq+mrefsq]
> +    LOAD        4, [prevq+prefsq]
[..]
> +    LOAD        3, [nextq+mrefsq]
> +    LOAD        4, [nextq+prefsq]
[..]

> +    movu       m2, [curq+mrefsq-1]
> +    movu       m3, [curq+prefsq-1]
[..]
> +    LOAD        2, [%2+mrefsq*2]
> +    LOAD        4, [%3+mrefsq*2]
> +    LOAD        3, [%2+prefsq*2]
> +    LOAD        5, [%3+prefsq*2]
[..]
> +    add      dstq, mmsize/2
> +    add     prevq, mmsize/2
> +    add      curq, mmsize/2
> +    add     nextq, mmsize/2
> +    sub        wd, mmsize/2
> +    cmp        wd, 0
> +    jg .loop%1

So, it seems to me mrefsq/prefsq doesn't change, can we increment them
before the filter and then use [dstq] instead? If you do that, we can
sub dstq, wq (after sign-extend wd->wq), neg wq, index [dstq] with
[dstq+wq] in the main loop and then the last 7 lines can simply be
replaced by:

add wq, mmsize/2
jl .loop%1

(i.e. you lose all the adds; you can lose the cmp regardless, it is
implied by the sub.)

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to