Sven Neumann <[EMAIL PROTECTED]> wrote:
>The code is combining the multiplications done on 2 channels of the
>same pixel into one. Also it is also meant as an example of what can
>be done without using CPU-specific instructions.
here's another example (4 x 8bit saturated addition):
uint32 padd_sat_4x8(uint32 a, uint32 b)
{
uint32 ta, tb, tm, q, u, m;
/* save overflow-causing bits in ta, tb */
ta = a & 0x80808080;
tb = b & 0x80808080;
q = a + b - (ta + tb);
/* determine overflow conditions */
tm = ta | tb;
u = (ta & tb) | (q & tm);
/* u now contains overflow bits, propagate them over fields */
m = (u << 1) - (u >> 7);
return (q + tm - u) | m;
}
This is completely portable, and should be a good deal faster than
conditionally adding each component separately, at least on modern
superscalar machines with expensive unpredicted branches. And benchmarks
confirm this
Extending the above to 8 x 8bit (using 64-bit integers) is trivial of course
_______________________________________________
Gimp-developer mailing list
[EMAIL PROTECTED]
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer