> +%macro loop_yuyvToY 2

Can we stick to macros having all-caps names?

> +.loop_%1:
> +    mov%1          m0, [srcq+wq*2]        ; (byte) { Y0, U0, Y1, V0, ... }
> +    mov%1          m1, [srcq+wq*2+mmsize] ; (byte) { Y8, U4, Y9, V4, ... }
> +%ifidn %2, yuyv
> +    pand           m0, m2                 ; (word) { Y0, Y1, ..., Y7 }
> +    pand           m1, m2                 ; (word) { Y8, Y9, ..., Y15 }

If aligned, AVX:

pand m0, m2, [srcq+wq*2]

Additionally, AVX itself supports unaligned operands, so you could add
an ifdef of the sort here for that (but x86inc has no explicit support
for this sort of thing in the abstraction layer).

> +%else ; uyvy
> +    psrlw          m0, 8                  ; (word) { Y0, Y1, ..., Y7 }
> +    psrlw          m1, 8                  ; (word) { Y8, Y9, ..., Y15 }
> +%endif ; yuyv/uyvy

I think you can do psrlw m0, [mem], 8 here too.

> +%macro loop_yuyvToUV 2

Same comments here as above.

> +    mova           m1, m0

AVX.

> +    mov%1          m0, [srcq+wq*2]        ; (byte) { U0, V0, U1, V1, ... }
> +    mov%1          m1, [srcq+wq*2+mmsize] ; (byte) { U8, V8, U9, V9, ... }
> +    mova           m2, m0
> +    mova           m3, m1
> +    pand           m0, m4                 ; (word) { U0, U1, ..., U7 }
> +    pand           m1, m4                 ; (word) { U8, U9, ..., U15 }
> +    psrlw          m2, 8                  ; (word) { V0, V1, ..., V7 }
> +    psrlw          m3, 8                  ; (word) { V8, V9, ..., V15 }
> +    packuswb       m0, m1                 ; (byte) { U0, ..., U15 }
> +    packuswb       m2, m3                 ; (byte) { V0, ..., V15 }

Loooots of AVX possible here.  Otherwise, looks good!

Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to