> The reworked version comes up with 110 bytes (still asserting MUL).

Nicely done.

> perf-metering with avrtest reveals a run time from ~3100 up to < 4800 
> ticks; high as expected.

While mine is 3161 cycles worst case (64 ones), or 4045 if !MUL.

So yours is actually not too unreasonable *if* the numbers are
uniformly distributed.  With more common distributions which obey
Benford's law, of course, it's pretty awful speed-wise.

I really wish I could find a way to skip the totally unnecessary final
multiplication of 0 * 10, without adding one extra instruction.

One slight speed saving: "Ten" is never overwritten anywhere in the
function.  You can load it once in the preamble and leave it.

Or you could get rid of the Ten register entirely, save a spill/fill
(106 bytes!)  and "ldi r0,10" in the multiplication loop.

_______________________________________________
AVR-libc-dev mailing list
AVR-libc-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-libc-dev

Reply via email to