On Wed, Dec 20, 2000 at 02:34:19AM 0600, Federico Mena Quintero
<[EMAIL PROTECTED]> wrote:
> Anyways, in libart and gdkpixbuf we have code like this to composite
> an RGBA image over an RGB pixel:
>
> dest[0] = r2 + ((tmp + (tmp >> 8) + 0x80) >> 8);
Warning!
Here is a formula I just came up with (about the same as above, actually,
but without rounding errors):
x = ((n<<8) + n + 257)>>16;
It works over the full range (n = 0..65535; x = 0..256) and is always
exact.
However, this optimization is not as important as you might think, as gcc
already uses exactly this technique, however, gcc uses a multiplication
since gcc's formula has to work over the full unsigned int range ;)
For n/255, gcc does this on x86;
movl %ebx,%eax
mull .LC0 ; = 0x80808081
movl %ebx,%eax
sall $8,%eax
While my formula boils down to:
shrl $7,%edx
leal 257(%ebx,%eax),%eax
shrl $16,%eax
In practise, gcc's code is faster if enough registers are available
(pii/iii), and usually not slower. It is also correct over the full
range.
So think twice before starting to "optimize" this division.
(And always remember to use UNSIGNED variables where applicable, since
these are much faster).

== 
== _ 
==(_)__ __ ____ __ Marc Lehmann +
==/ / _ \/ // /\ \/ / [EMAIL PROTECTED] e
=====/_/_//_/\_,_/ /_/\_\ XX11RIPE +
The choice of a GNU generation 
