As I'm looking through the rest of the code, I notice a few silly things. For example, couldn't rand_r save a jump by declaring it
int rand_r(unsigned long *ctx) __attribute__((alias("do_rand"))); But the real problem is the choice of algorithm. 32x32-bit multiplies are very ugly and bloated on an 8-bit AVR. I could replace it with an 8-bit lag-3 multiply-with-carry generator with the same period (2^31 - epsilon) and state size (32 bits) which would be a lot faster. But do people depend on that exact sequence of values being generated? If they are, there's still room for improvement. You could at least replace the hideous mess of 32-bit divides with a two 16x16->32-bit multiplies (since the multiplier 16807 fits into 16 bits) and some simpler modular reduction code. I.e. uint32_t x = *ctx; uint32_t high = (x>>16) * 16807ul; x = (uint16_t)x * 16807ul; /* The 48-bit product is (high << 16) + x; reduce mod 2^31-1 */ high += x >> 16; x = (uint16_t)x + (high << 16); x &= 0x7fffffff; x += high << 1 >> 16; x += x >> 31; x &= 0x7ffffffff; *ctx = x; _______________________________________________ AVR-libc-dev mailing list AVR-libc-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-libc-dev