> +%macro loop_yuyvToY 2
Can we stick to macros having all-caps names?
> +.loop_%1:
> + mov%1 m0, [srcq+wq*2] ; (byte) { Y0, U0, Y1, V0, ... }
> + mov%1 m1, [srcq+wq*2+mmsize] ; (byte) { Y8, U4, Y9, V4, ... }
> +%ifidn %2, yuyv
> + pand m0, m2 ; (word) { Y0, Y1, ..., Y7 }
> + pand m1, m2 ; (word) { Y8, Y9, ..., Y15 }
If aligned, AVX:
pand m0, m2, [srcq+wq*2]
Additionally, AVX itself supports unaligned operands, so you could add
an ifdef of the sort here for that (but x86inc has no explicit support
for this sort of thing in the abstraction layer).
> +%else ; uyvy
> + psrlw m0, 8 ; (word) { Y0, Y1, ..., Y7 }
> + psrlw m1, 8 ; (word) { Y8, Y9, ..., Y15 }
> +%endif ; yuyv/uyvy
I think you can do psrlw m0, [mem], 8 here too.
> +%macro loop_yuyvToUV 2
Same comments here as above.
> + mova m1, m0
AVX.
> + mov%1 m0, [srcq+wq*2] ; (byte) { U0, V0, U1, V1, ... }
> + mov%1 m1, [srcq+wq*2+mmsize] ; (byte) { U8, V8, U9, V9, ... }
> + mova m2, m0
> + mova m3, m1
> + pand m0, m4 ; (word) { U0, U1, ..., U7 }
> + pand m1, m4 ; (word) { U8, U9, ..., U15 }
> + psrlw m2, 8 ; (word) { V0, V1, ..., V7 }
> + psrlw m3, 8 ; (word) { V8, V9, ..., V15 }
> + packuswb m0, m1 ; (byte) { U0, ..., U15 }
> + packuswb m2, m3 ; (byte) { V0, ..., V15 }
Loooots of AVX possible here. Otherwise, looks good!
Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel