On 2002.04.10 17:42 Brian Paul wrote:
> 
> José,
> 
> I've checked in the code after testing with Glean and the OpenGL
> conformance
> tests.
> 

Great.

> Was I supposed to change something in the C code?  It passes the
> conformance tests as-is.
> 

I was surprised that the C code passed the conformance tests, because of 
the signed arithmetic it doesn't give the same results as before. So I've 
made a small comparision with the several methods (test program attached):

        // Nathan's method - unsigned 24bit arithmetic
        // NOTE: this was the original Mesa code
        t1 = p*a + q*(255 - a);
        s1 = (t1 + (t1 << 8) + 256) >> 16;
                 
        // Nathan's method - signed 24bit arithmetic (less one multiply)
        // NOTE: this is how I changed and is now
        t2 = (p - q)*a;
        s2 = (t2 + (t2 << 8) + 256) >> 16;
        s2 += q;
        s2 &= 0xff;
                 
        // Blin's method - unsigned 16bit arithmetic
        // NOTE: is exact
        t3 = p*a + q*(255-a) + 128;
        s3 = (t3 + (t3 >> 8)) >> 8;
                 
        // Blin's method - signed 16bit arithmetic (less one multiply)
        // NOTE: is exact because the negative sign is considered
        t4 = ((p - q)*a + (p > q ? 128 : -128)) & 0xffff;
        s4 = (t4 + (t4 >> 8)) >> 8;
        s4 += q;
        s4 &= 0xff;

When one compares with the exact result

        // exact result - rounded
        s = (unsigned) (((double)p)*(((double)a)/255.0) + 
((double)q)*(1.0-((double)a)/255.0) + 0.5);

one gets:

        1: 8164890 differences in 16777216
        2: 8148697 differences in 16777216
        3: 0 differences in 16777216
        4: 0 differences in 16777216

So spite of the different results between 1 and 2, 2 gives better results 
overall!!

What happens is that method 1 is aimed to follow the truncated results and 
not the rounded. If one compares with the truncated result

        // truncated result
        s = (unsigned) (((double)p)*(((double)a)/255.0) + 
((double)q)*(1.0-((double)a)/255.0));

one gets:

        1: 15467 differences in 16777216
        2: 31660 differences in 16777216
        3: 8180357 differences in 16777216
        4: 8180357 differences in 16777216

Notice that, by this point of view, the method 2 is indeed worst, but this 
really doesn't matter because is the wrong point of view.

This explains why the current C code passes the conformance tests.

At this moment the MMX code implements method 4, which is very fast. There 
is no point in implement method 2, spite being a little faster than method 
4 (because of the simpler rounding) because it would requite 24bit 
arithmetic instead of 16, so less numbers could be multiplied at the same 
time.

So, in contrary of what I thought, there is no need to switch to method 1. 
When I implement the double blend trick I will have to use the method 4, 
again for the same reasons of above.

But since the specs give some tolerance it would be nice to run the 
conformance tests with different settings in mmx_blend.S, specially the 
"single multiply w/o rouding" which would give at least 5% improvement (it 
will be a little more because it would allow to free some registers 
allowing to leaving some necessary constants there).

For that is just necessary to change

        #define GMBT_ROUNDOFF           0

leaving the rest as before

        #define GMBT_ALPHA_PLUS_ONE             0
        #define GMBT_GEOMETRIC_SERIES   1
        #define GMBT_SIGNED_ARITHMETIC  1

Using the alpha+1 method and not using the geometric series would be the 
even faster but it is already marked on the C code as rejected by glean...

> Thanks for you work!
> 
> -Brian
> 

Regards,

José Fonseca
#include <stdio.h>
#include <stdlib.h>

int main()
{
	unsigned short p, q, a;
	unsigned c1 = 0, c2 = 0, c3 = 0, c4 = 0;
	
	for (p = 0; p <= 255; ++p)
	for (q = 0; q <= 255; ++q)
	for (a = 0; a <= 255; ++a)
	{
		unsigned s;
		unsigned s1, s2, s3, s4;
		unsigned t1, t2, t3, t4;

#if 1
		// exact result - rounded
		s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0) + 0.5);
#else
		// truncated result
		s = (unsigned) (((double)p)*(((double)a)/255.0) + ((double)q)*(1.0-((double)a)/255.0));
#endif

		// Nathan's method - unsigned 24bit arithmetic
		t1 = p*a + q*(255 - a);
		s1 = (t1 + (t1 << 8) + 256) >> 16;
		
		// Nathan's method - signed 24bit arithmetic
		t2 = (p - q)*a;
		s2 = (t2 + (t2 << 8) + 256) >> 16;
		s2 += q;
		s2 &= 0xff;
		
		// Blin's method - unsigned 16bit arithmetic
		// NOTE: is exact
		t3 = p*a + q*(255-a) + 128;
		s3 = (t3 + (t3 >> 8)) >> 8;
		
		// Blin's method - signed 16bit arithmetic
		// NOTE: is exact because the negative sign is considered
		t4 = ((p - q)*a + (p > q ? 128 : -128)) & 0xffff;
		s4 = (t4 + (t4 >> 8)) >> 8;
		s4 += q;
		s4 &= 0xff;
		
		if(s1 != s) ++c1;
		if(s2 != s) ++c2;
		if(s3 != s) ++c3;
		if(s4 != s) ++c4;
		if (s1 != s || s2 != s || s3 != s || s4 != s)
		{
//			printf("%3ux%3ux%3u:\t(%3u)\t%3u\t%3u\t%3u\t%3u\n", p, a, q, s, s1, s2, s3, s4);
		}
	}
	
	printf("1: %u differences in %u\n", c1, 256*256*256);
	printf("2: %u differences in %u\n", c2, 256*256*256);
	printf("3: %u differences in %u\n", c3, 256*256*256);
	printf("4: %u differences in %u\n", c4, 256*256*256);
	
	return 0;
}

Reply via email to