Hi, > In burn_rspc_mult: > Here is a more bullet-proof version of the second test: > (((int)a - 1) | ((int)b - 1)) < 0
Bullet-proof is more important than speed. Whatever, all variations had only negative effects. (I measure 10 * 800 MB in about 110 seconds. The negative effects are in the range of 10 to 20 seconds.) > > static unsigned char burn_rspc_div_3(unsigned char a) > Given that gfpow is doubled, this code should be faster and simpler: Stupid me. The -25 case is unneeded, indeed. :)) Nevertheless the overall impact of the division is quite low. For 2352 bytes it happens 69 times. There are more than 4000 multiplications which each need about the same time as a division. My compliments for your knowledge about code optimizations. Your ideas are more elaborate than mine. It seems that gcc -O2 works best if one does not try to squeeze small details. (I would assume this is a property of the compiler and not so much of the processor.) Significant results came from: - Unrolling gfpow[] to 509 elements. - Uniting two pairs of loops which shared nearly the same index computation. Anything else had no effect or even hampered -O2. I made some of the neutral simplifications nevertheless. It cannot harm to have fewer C statements in the code. Besides the remaining opportunities for parallelisation maybe a less highschoolish method of solving HxV=0 might lead to better results. Have a nice day :) Thomas -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

