Brian,

On 2002.04.12 20:14 Brian Paul wrote:
> ...
> 
> I'd like to see Mesa satisfy the 255*255=255 identity.  Is it hard to
> implement that in the MMX code?  If it is, we could let it go for now
> and see if anyone complains.
> 

I guess you didn't received my previous email yet. This is already 
satisfied by the MMX code, which (as is is now) give _always_ the exact 
results, including in these extreme cases, for 8 bit.

But your comments sort of answer my previous question regarding this as 
well.

> ...
> 
> It's been at least a year since I touched that code.  As far as I can
> remember the comments are correct.  Though I don't remember if it was
> an issue at 5/6/5 or 8/8/8 color depth, or both.  I don't know what
> else might have changed since then to cause different results with
> Glean.
> 

It's an issue just with 8/8/8 color depth.

> 
> > Thanks for all your good work, by the way!
> 
> Yes!
> 
> -Brian
> 

So I guess it's probably best to leave the code as it is now... but wait!

And what if we do:

        t/255 ~= (t + (t>>8) + (t>> 15)) >> 8

this gives 255 for t = 255*255.

I made some further enquires:
  - also 16bit arithmetic only.
  - it doesn't gives the exact results just 4.241.987 out of 16.777.216 
possible cases, i.e., is exact 75% of the times.
  - very easy to code, in fact already done for MMX code (see initial 
patch attached)
  - it also gives a 6% speedup, in my benchmark from the previous 3.637088 
sec to 3.429032 sec. Plus a little more when I optimize the assembly code 
a little further since it the abcense of rounding frees some registers.
  - and glean likes it, since it just give an error of 0.522796 !!

I guess we've got ourselves a new method: Fonseca's method!! He!He!

Now for real. I would appreciate comments on this as I don't believe much 
in wonderful discoveries..


José Fonseca


PS: In case you're wandering if all these details do really matter so 
much, most of the stuff here will apply to the C code optimizations as 
well. And don't forget that when the 64bit processors get into our homes 
we can make what is being done now on MMX code directly on the C code! Of 
course that most of the readers have 3D cards that do this much faster & 
better... but there's no fun on that! ;-)


PPS: I'm renaming the subjects because I've noticed that my threads have 
the nasty habit of going on and on forever, and I want people to still 
read my emails! ;-)
Index: mmx_blend.S
===================================================================
RCS file: /cvsroot/mesa3d/Mesa/src/X86/mmx_blend.S,v
retrieving revision 1.8
diff -u -r1.8 mmx_blend.S
--- mmx_blend.S 10 Apr 2002 16:32:32 -0000      1.8
+++ mmx_blend.S 12 Apr 2002 20:48:55 -0000
@@ -39,7 +39,15 @@
  *
  * achieving the exact results
  */
-#define GMBT_ROUNDOFF          1
+#define GMBT_ROUNDOFF          0
+
+/* instead of the roundoff this adds a small correction to satisfy the OpenGL criteria
+ *
+ *   t/255 ~= (t + (t >> 8) + (t >> 15)) >> 8
+ *
+ * note that although is faster than rounding off it doesn't give always the exact 
+results
+ */
+#define GMBT_GEOMETRIC_CORRECTION      1
 
 /*
  * do
@@ -282,6 +290,14 @@
 
     PADDW      ( MM3, MM2 )                    /*        t1 + (t1 >> 8) ~= (t1/255) 
<< 8        */
     PADDW      ( MM5, MM6 )                    /*        t2 + (t2 >> 8) ~= (t2/255) 
<< 8        */
+
+#if GMBT_GEOMETRIC_CORRECTION 
+    PSRLW      ( CONST(7), MM3 )               /*                    t1 >> 15         
+          */
+    PSRLW      ( CONST(7), MM5 )               /*                    t2 >> 15         
+          */
+
+    PADDW      ( MM3, MM2 )                    /*  t1 + (t1 >> 8) + (t1 >>15) ~= 
+(t1/255) << 8  */
+    PADDW      ( MM5, MM6 )                    /*  t2 + (t2 >> 8) + (t2 >>15) ~= 
+(t2/255) << 8  */
+#endif
 #endif
 
 #if GMBT_SIGNED_ARITHMETIC

Reply via email to