Denis Oliver Kropp wrote:
Quoting Claudio KLaN Ciccani:
+static void Bop_yuy2_Sto_Aop( GenefxState *gfxs )
+{
+ int w;
+ int i;
+ int l = 0;
+ int cr = 0;
+ __u8 *D = gfxs->Aop;
+ __u8 *S = gfxs->Bop;
+ int SperD = gfxs->SperD;
+
+ for (w = 0; w < gfxs->length; w++) {
+ i = (l >> 16) * 2;
+ *D = S[i]; /* blit luma */
+
+ l += SperD;
+
+ if (!(w & 1)) { /* blit chroma */
+ i = (cr >> 16) * 4;
+ *(D+1) = S[i+1];
+ *(D+3) = S[i+3];
+
+ cr += SperD;
+ }
+
+ D += 2;
+ }
+}
You should read and write 32 bit at once, otherwise you have a big performance impact on most non-x86 and you have to add extra code for big/little endian.
static void Bop_yuy2_Sto_Aop( GenefxState *gfxs ) { int w; int i; int j = 0; __u8 *D = gfxs->Aop; __u8 *S = gfxs->Bop; int SperD = gfxs->SperD;
for (w = 0; w < gfxs->length; w++) { i = (j >> 16) * 2; *D = S[i]; /* blit luma */
if (!(w & 1)) { /* blit chroma */ i = (j >> 17) * 4; *(D+1) = S[i+1]; *(D+3) = S[i+3]; }
D += 2; j += SperD; } }
Blitting 32 bits at once will be faster, but luma and chroma must be scaled separately otherwise the image
will seem horribly corrupted. The above function is the fastest method I found to do this, _currently_ (note:
"currently" is underlined).
Also we don't need to add extra code for big/little endians; for big-endians systems swapping Bop_yuy2_Sto_Aop
and Bop_uyvy_Sto_Aop (ie: using Bop_yuy2_Sto_Aop for UYVY and vice versa) should be sufficient.
Claudio
