Brian, On 2002.04.12 20:14 Brian Paul wrote: > ... > > I'd like to see Mesa satisfy the 255*255=255 identity. Is it hard to > implement that in the MMX code? If it is, we could let it go for now > and see if anyone complains. >
I guess you didn't received my previous email yet. This is already satisfied by the MMX code, which (as is is now) give _always_ the exact results, including in these extreme cases, for 8 bit. But your comments sort of answer my previous question regarding this as well. > ... > > It's been at least a year since I touched that code. As far as I can > remember the comments are correct. Though I don't remember if it was > an issue at 5/6/5 or 8/8/8 color depth, or both. I don't know what > else might have changed since then to cause different results with > Glean. > It's an issue just with 8/8/8 color depth. > > > Thanks for all your good work, by the way! > > Yes! > > -Brian > So I guess it's probably best to leave the code as it is now... but wait! And what if we do: t/255 ~= (t + (t>>8) + (t>> 15)) >> 8 this gives 255 for t = 255*255. I made some further enquires: - also 16bit arithmetic only. - it doesn't gives the exact results just 4.241.987 out of 16.777.216 possible cases, i.e., is exact 75% of the times. - very easy to code, in fact already done for MMX code (see initial patch attached) - it also gives a 6% speedup, in my benchmark from the previous 3.637088 sec to 3.429032 sec. Plus a little more when I optimize the assembly code a little further since it the abcense of rounding frees some registers. - and glean likes it, since it just give an error of 0.522796 !! I guess we've got ourselves a new method: Fonseca's method!! He!He! Now for real. I would appreciate comments on this as I don't believe much in wonderful discoveries.. José Fonseca PS: In case you're wandering if all these details do really matter so much, most of the stuff here will apply to the C code optimizations as well. And don't forget that when the 64bit processors get into our homes we can make what is being done now on MMX code directly on the C code! Of course that most of the readers have 3D cards that do this much faster & better... but there's no fun on that! ;-) PPS: I'm renaming the subjects because I've noticed that my threads have the nasty habit of going on and on forever, and I want people to still read my emails! ;-)
Index: mmx_blend.S =================================================================== RCS file: /cvsroot/mesa3d/Mesa/src/X86/mmx_blend.S,v retrieving revision 1.8 diff -u -r1.8 mmx_blend.S --- mmx_blend.S 10 Apr 2002 16:32:32 -0000 1.8 +++ mmx_blend.S 12 Apr 2002 20:48:55 -0000 @@ -39,7 +39,15 @@ * * achieving the exact results */ -#define GMBT_ROUNDOFF 1 +#define GMBT_ROUNDOFF 0 + +/* instead of the roundoff this adds a small correction to satisfy the OpenGL criteria + * + * t/255 ~= (t + (t >> 8) + (t >> 15)) >> 8 + * + * note that although is faster than rounding off it doesn't give always the exact +results + */ +#define GMBT_GEOMETRIC_CORRECTION 1 /* * do @@ -282,6 +290,14 @@ PADDW ( MM3, MM2 ) /* t1 + (t1 >> 8) ~= (t1/255) << 8 */ PADDW ( MM5, MM6 ) /* t2 + (t2 >> 8) ~= (t2/255) << 8 */ + +#if GMBT_GEOMETRIC_CORRECTION + PSRLW ( CONST(7), MM3 ) /* t1 >> 15 + */ + PSRLW ( CONST(7), MM5 ) /* t2 >> 15 + */ + + PADDW ( MM3, MM2 ) /* t1 + (t1 >> 8) + (t1 >>15) ~= +(t1/255) << 8 */ + PADDW ( MM5, MM6 ) /* t2 + (t2 >> 8) + (t2 >>15) ~= +(t2/255) << 8 */ +#endif #endif #if GMBT_SIGNED_ARITHMETIC