Perhaps the two different reduction-mod-5 schemes should depend on
OPTIMIZE_SPEED?

Speaking of optimization, there are a significant number of places in
libc/string (and libx/pmstring) where adding one instruction could save
one cycle per byte.

Most of the loops end with

        subi    len_lo, lo8(1)
        sbci    len_hi, hi8(1)
        brcc    .L_loop

The first thing that leaps to mind is that r25:r24 is free (since the
first argument is a pointer which gets moved to X or Z), so adding one
movw to the top would let the loop be counted with sbiw.

But although that saves an instruction, sbiw takes 2 cycles, so we've
gained nothing in the loop and lost one cycle to the movw.


However, there's a second trick, and this one does work.  When the amount
subtracted fits into one byte, you can write:

        subi    len_lo, lo8(1)
        brcc    .L_loop
        sbci    len_hi, 0
        brcc    .L_loop

That's one instruction more, but 255/256 times, the loop will be one
cycle shorter.  (1/256 times, including the last time, the loop will be
one cycle *longer*.)

What's nice about this is that it's easy to define a macro (O_brcc?) which
expands to nothing if OPTIMIZE_SPEED is off.

Anyway, back to what I'm supposed to be working on.  I'm just poking around
the code to familiarize myself with it and noticing low-hanging fruit.

_______________________________________________
AVR-libc-dev mailing list
AVR-libc-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-libc-dev

Reply via email to