Sven Neumann <[EMAIL PROTECTED]> wrote:
>The code is combining the multiplications done on 2 channels of the 
>same pixel into one. Also it is also meant as an example of what can 
>be done without using CPU-specific instructions.

here's another example (4 x 8bit saturated addition):

uint32 padd_sat_4x8(uint32 a, uint32 b)
    uint32 ta, tb, tm, q, u, m;
    /* save overflow-causing bits in ta, tb */
    ta = a & 0x80808080;
    tb = b & 0x80808080;
    q = a + b - (ta + tb);
    /* determine overflow conditions */
    tm = ta | tb;
    u = (ta & tb) | (q & tm);
    /* u now contains overflow bits, propagate them over fields */
    m = (u << 1) - (u >> 7);
    return (q + tm - u) | m;

This is completely portable, and should be a good deal faster than
conditionally adding each component separately, at least on modern
superscalar machines with expensive unpredicted branches. And benchmarks
confirm this

Extending the above to 8 x 8bit (using 64-bit integers) is trivial of course

Gimp-developer mailing list

Reply via email to