Denis Oliver Kropp wrote:

Quoting Claudio KLaN Ciccani:


+static void Bop_yuy2_Sto_Aop( GenefxState *gfxs )
+{
+ int w;
+ int i;
+ int l = 0;
+ int cr = 0;
+ __u8 *D = gfxs->Aop;
+ __u8 *S = gfxs->Bop;
+ int SperD = gfxs->SperD;
+
+ for (w = 0; w < gfxs->length; w++) {
+ i = (l >> 16) * 2;
+ *D = S[i]; /* blit luma */
+
+ l += SperD;
+
+ if (!(w & 1)) { /* blit chroma */
+ i = (cr >> 16) * 4;
+ *(D+1) = S[i+1];
+ *(D+3) = S[i+3];
+
+ cr += SperD;
+ }
+
+ D += 2;
+ }
+}



You should read and write 32 bit at once, otherwise you have a big performance impact on most non-x86 and you have to add extra code for big/little endian.




static void Bop_yuy2_Sto_Aop( GenefxState *gfxs )
{
     int   w;
     int   i;
     int   j     = 0;
     __u8 *D     = gfxs->Aop;
     __u8 *S     = gfxs->Bop;
     int   SperD = gfxs->SperD;

     for (w = 0; w < gfxs->length; w++) {
      i  = (j >> 16) * 2;
      *D = S[i]; /* blit luma */

      if (!(w & 1)) { /* blit chroma */
            i      = (j >> 17) * 4;
        *(D+1) = S[i+1];
        *(D+3) = S[i+3];
      }

      D += 2;
      j += SperD;
     }
}

Blitting 32 bits at once will be faster, but luma and chroma must be scaled separately otherwise the image
will seem horribly corrupted. The above function is the fastest method I found to do this, _currently_ (note:
"currently" is underlined).
Also we don't need to add extra code for big/little endians; for big-endians systems swapping Bop_yuy2_Sto_Aop
and Bop_uyvy_Sto_Aop (ie: using Bop_yuy2_Sto_Aop for UYVY and vice versa) should be sufficient.



Claudio







Reply via email to