> Here are some results for different CPU (time measured by encoding
> 100000 sectors with non zero data including scrambling):
>
> I'll put the new lec.cc sources to the patches section of SF.
Why
d = *p_lsb;
p0_lsb ^= (*coeffs0)[d];
p1_lsb ^= (*coeffs1)[d];
d = *p_msb;
p0_msb ^= (*coeffs0)[d];
p1_msb ^= (*coeffs1)[d];
coeffs0++;
coeffs1++;
p_lsb += 2 * 43;
p_msb += 2 * 43;
and not (provided that coeffs01 is a pointer to array of pointers to
[256][2] matrices)
d = *p_lsb;
p0_lsb ^= (*coeffs01)[d][0];
p1_lsb ^= (*coeffs01)[d][1];
d = *p_msb;
p0_msb ^= (*coeffs01)[d][0];
p1_msb ^= (*coeffs01)[d][1];
coeffs01++;
p_lsb += 2 * 43;
p_msb += 2 * 43;
or even (povided that short_coeffs01 is [originally] a pointer to a
[43][256] matrix of shorts)
d0 = *p_lsb;
d1 = *(p_lsb+1);
short_p01_lsb ^= short_coeffs01[d0];
short_p01_msb ^= short_coeffs01[d1];
short_coeffs01+=43;
p_lsb += 2 * 43;
I.e. gentler on cache and from 8 loads down to 4. Then point with last
variant is that it requires not more that 7 registers which perfectly
fits IA-32 bank.
Another way to loosen up compiler optimization would be to declare
tables as const. This implies that tables has to be wrapped into classes
[as const instances can be initialized from class constructors only].
This would also obsolete lec_init().
Is it OK like this or should I submit working code?
Cheers. A.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]