Perhaps the two different reduction-mod-5 schemes should depend on OPTIMIZE_SPEED?
Speaking of optimization, there are a significant number of places in libc/string (and libx/pmstring) where adding one instruction could save one cycle per byte. Most of the loops end with subi len_lo, lo8(1) sbci len_hi, hi8(1) brcc .L_loop The first thing that leaps to mind is that r25:r24 is free (since the first argument is a pointer which gets moved to X or Z), so adding one movw to the top would let the loop be counted with sbiw. But although that saves an instruction, sbiw takes 2 cycles, so we've gained nothing in the loop and lost one cycle to the movw. However, there's a second trick, and this one does work. When the amount subtracted fits into one byte, you can write: subi len_lo, lo8(1) brcc .L_loop sbci len_hi, 0 brcc .L_loop That's one instruction more, but 255/256 times, the loop will be one cycle shorter. (1/256 times, including the last time, the loop will be one cycle *longer*.) What's nice about this is that it's easy to define a macro (O_brcc?) which expands to nothing if OPTIMIZE_SPEED is off. Anyway, back to what I'm supposed to be working on. I'm just poking around the code to familiarize myself with it and noticing low-hanging fruit. _______________________________________________ AVR-libc-dev mailing list AVR-libc-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/avr-libc-dev