Eterm devs,

PATCHES:

1.      Like I suspected the 15bpp w/ saturation C shading routine flips on
too many bits in adjacent colors just like the 16bpp C routine did.
This patch corrects that behavior.

2.      I'm not sure how 15bpp is defined.  If the highest bit should always
be zero then there is a bug in the 15bpp MMX shading routines.  I
noticed it comparing the output of my new 15bpp shading routine to the
old one.  If the 15th pixel is ALWAYS, and WILL ALWAYS BE, ignored then
this is not an issue.  If not then the 15bpp MMX shading routine leaves
overflow from the red color modification in the left most bit and it
should be cleared.  I have attached a patch to do just that.  This is
only an issue in the saturation section as mathematics say that without
saturation red will never overflow.

NOTES:
The status of the x86_64 port of the MMX routines is kinda dead.  While
doing the port it occurred to me that all 64bit processors will have at
least SSE2 with 128bit Multi-Media registers and that there is no reason
that I shouldn't take advantage of that.  As a result the *NEW* SSE2
port of the shading routines is as follows:

1.      The 15bpp SSE2 shading routines are complete and verified to shade
identically to the 15bpp (patched) C shading routines.  They shade 8
pixels per pass until pixels_remaining_for_line / 8 = 0 and then shade
one at a time.  This is twice as many pixels per pass as the MMX
routines and we should see a corresponding speed improvement as well.

2.      The 16bpp SSE2 shading routines are complete and verified to shade
identically to the 16bpp (patched) C shading routines.  The same
performance boost as 15bpp mode has been included.

3.      The 32bpp routine is currently working with the 64 bit MMX registers
and processing one pixel at a time.  I hope to convert it to use the
full 128 bits and process two pixels at a time.  That is the max as room
for overflow is needed (see note below).  This will more than double the
complexity of this routine but also double its performance.

4.      The 24bpp routine is still under investigation.  There is not a 24bpp
MMX shading routine but that isn't the problem.  The problem is moving
24 bits of data into a processor's register and zero padding the
remainder of the pixel to a byte boundary of 2^n (where n is
non-negative and whole).  24 bits = 3 bytes and there is no 'n' that
works directly.  The only solution is to read a byte at a time.  That's
three reads and three writes for each pixel.  That is actually what the
C routine does by manipulating the three unsigned chars.  Once each
pixel is loaded the shading is identical to the 32bpp routines but the
overhead of unpacking the 24 bits into 32 and then repacking is not
looking to be worth it, especially if after all of that work we can only
process two pixels at a time.  I attempted a work around that reads the
data 32 bits at a time and simply writes the top most 8 bits back out
when storing the other 24 bits of the pixel.  If anybody has any
suggestions I've overlooked on this topic then _PLEASE_ speak up.

Things to note (maybe for the Eterm man page under --cmod):  All of the
colors of all of the pixels need some room for overflow during the
intermediate steps of the shading.  Although no hard errors will occur
strange behavior will happen when the color * modifier exceeds the
temporary storage.  For 15 & 16 bpp mode overflow bits are:

15bpp   5 bits red      5 bits green    5 bits blue
        3 bit overflow  3 bit overflow  3 bit overflow  
16bpp   5 bits red      6 bits green    5 bits blue
        3 bit overflow  2 bit overflow  3 bit overflow  

This is true for all the shading routines: C, MMX, and SSE2.  In 24 & 32
bpp modes the color consumes the entire byte and so a word is used for
the intermediate values.  Therefore each color of each pixel has a full
8 bits for overflow.  The colors are still condensed back to 8 bits upon
completion though.  It is impossible to use a couple of bits from the
alpha channel for overflow as the working size must be byte aligned and
the first size above 8 bits is 16.

While lurking on the #gento-dev channel I noticed some of the devs
bitching about the register allocator in gcc (v. 4 I think).  The MMX
routines expect the register allocator to behave a certain way and will
bitch loudly (or SEG_FAULT) if its behavior changes.  (The incoming
parameters will be in unpredictable locations).  To avoid any problems
with this issue I have opted to write the SSE2 routines using inline
assembly.  Even if I had written it in pure assembly combining it with
the mmx_cmod.S would have required more #ifdef 's than code.  Sorry
Mej!  :-/  I started to do it that way for you but if you saw the code
you'd flip.  Much more detailed info is in the comments at the top of
the new file and will be submitted soon.

Is there a way to look at Eterm-0.9.4/src/pixmap.c without getting the
entire CVS tree?  A link to a web page with the latest pixmap.c source
in CVS would be awesome!  TIA.

Time to sleeeeep,
The River Rat

P.S.    I still have the MMX port to 64bit for the 32 & 16 bpp shading
routines if anyone is interested.  These routines run on a 64 bit
processor but use standard MMX calls (not SSE2) and only use 64bits of
the Multi-Media registers.  If anyone is interested in them speak up now
or I'll probably just delete them at the completion of the SSE2 port.
They DO work though!  8-)

-- 
Tres
--- Eterm-0.9.3-orig/src/mmx_cmod.S	2004-01-11 15:13:02.000000000 -0700
+++ Eterm-0.9.3/src/mmx_cmod.S	2005-05-07 07:45:07.000000000 -0600
@@ -198,6 +201,7 @@
         paddusw %mm3, %mm1      /* ff eg */
         paddusw %mm3, %mm2      /* ff eb */
 
+        psubw %mm3, %mm0        /* 00 0r */
         psubw %mm3, %mm1        /* 00 0g */
         psubw %mm3, %mm2        /* 00 0b */
         
@@ -234,6 +238,7 @@
         paddusw %mm3, %mm1      /* ff eg */
         paddusw %mm3, %mm2      /* ff eb */
 
+        psubw %mm3, %mm0        /* 00 0r */
         psubw %mm3, %mm1        /* 00 0g */
         psubw %mm3, %mm2        /* 00 0b */
         
--- Eterm-0.9.3-orig/src/pixmap.c	2004-07-22 14:12:31.000000000 -0600
+++ Eterm-0.9.3/src/pixmap.c	2005-05-07 07:54:00.000000000 -0600
@@ -1559,16 +1590,13 @@
             for (x = -w; x < 0; x++) {
                 int r, g, b;
 
-                b = ((DATA16 *) ptr)[x];
-                r = (b & 0x7c00) * rm;
-                g = (b & 0x3e0) * gm;
-                b = (b & 0x1f) * bm;
-                r |= (!(r >> 15) - 1);
-                g |= (!(g >> 10) - 1);
-                b |= (!(b >> 5) - 1);
-                ((DATA16 *) ptr)[x] = ((r >> 8) & 0x7c00)
-                    | ((g >> 8) & 0x3e0)
-                    | ((b >> 8) & 0x1f);
+                r = ( (b >> 10 )            * rm ) >> 8;
+                r = ( r > 0x001f ) ? 0xfc00 : ( r << 10 );
+                g = (((b >>  5 ) & 0x003f ) * gm ) >> 8;
+                g = ( g > 0x001f ) ? 0x03e0 : ( g << 5 );
+                b = (( b         & 0x001f ) * bm ) >> 8;
+                b = ( b > 0x001f ) ? 0x001f : b;
+                ((DATA16 *) ptr)[x] = (r|g|b);
             }
             ptr += bpl;
         }

Reply via email to