On 04.12.2016 23:53, George Spelvin wrote:
The developers of the TICC time interval counter are having
a problem printing 64-bit integers:
https://github.com/TAPR/TICC/blob/master/TO-DO.txt
I went to work figuring out how to do that conversion, and after some
false starts, came up with some quite efficient arbitrary-precision
printing code. It's small, fast, and works for any number of bytes.
(Implementation currently limited to 255 bytes.)
The code, in its current state, is appended; it's been tested extensively
on x86 and more lightly on simulavr. Any comments from experienced AVR
programmers is definitely appreciated. Like the current __ultoa_invert,
it formats the number backward in a caller-provided buffer.
It's (currently) slightly larger, but slightly faster than __ultoa_invert.
Timing in cycles for a simulated atmega1280:
Input genprint __ultoa_invert
0 160
0xff 316 480
0xffff 584 792
0xffffff 1005 1260
0xffffffff 1434 1572
0xffffffffff 2024
48 ones 2626
56 ones 3286
64 ones 4103
I could just pass this over to the TICC project, but is there any interest
in me making the necessary overhauls to vfprintf to incorporate this?
I'd have to change vfprintf to pass around a pointer and length internally
rather than copying the value. The printing code requires the input
in a modifiable little-endian buffer, but that's how varargs works on
AVR anyway. (Adding the hex/octal and negative number handling is quite
simple.)
It it possible to do it with the same code and not special-casing
different radices? Even if the code runs slower (which is not really
important for output routines) the code size might drop.
If I get really sneaky, I could eliminate the output buffer by having
it push the formatted number on the stack and call down to an output
routine that supplies the necessary prefix padding once the length is
known. But that might not be doable in C unless there's a way to force
allocation of a stack frame.
(If anyone cares, I have some other code for 16- and 32-bit numbers based
on the ideas at http://homepage.divms.uiowa.edu/~jones/bcd/decimal.html.
It converts 2 digits at a time, using a 100-byte binary-to-BCD lookup
table, but is a lot larger, extending it to 64 bits is a mess, and doing
without a hardware multiplier bloats it with shift-and-add sequences.)
Usually, we are after code-size, not speed; in particular for such
output and conversion routines.
#if __AVR_ARCH__
In the library you'll have to protect the code against AVR_TINY, i.e.
#ifndef __AVR_TINY__
typedef _Bool bool;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
#define false (bool)0
#define true (bool)1
Dunno how libc's modules are compiled, but they are using stdint.h all
over the place, so stdbool.h and stdint.h should be in order, no?
/* Reduce a byte mod 5. */
static uint8_t __attribute__((const))
mod5(uint8_t x)
{
#if __AVR_ARCH__
uint8_t tmp;
/* Reduce x mod 15. We add the high halves, to get a carry. */
asm( "mov __tmp_reg__,%0"
"\n swap __tmp_reg__"
"\n cbr %0,15"
"\n add %0,__tmp_reg__"
"\n cbr %0,15"
"\n swap %0" /* Swap and add carry bit to low */
"\n adc %0,__zero_reg__" /* End-around carry */
: "+d" (x) :: "r0");
#else
x = (x >> 4) + (x & 0xf); /* 1 <= x <= 30 */
x = (x >> 4) + (x & 0xf); /* 1 <= x <= 15 */
#endif
/* 1 <= x <= 15; now for final reduction mod 5 */
if (x >= 10)
x -= 10; /* 0 <= x <= 9 */
if (x >= 5)
x -= 5; /* 0 <= x <= 4 */
return x;
}
Again, we can safe code size by slightly slowing things down, e.g.
mod5 (uint8_t x)
{
#if __AVR_ARCH__
asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (35));
asm ("0: $ subi %0,%1 $ brcc 0b $ subi %0,%n1" : "+d" (x) : "n" (5));
return x;
#else
...
The intermediate step via 35 is not essential, it's just a speed-up.
do {
uint16_t accum;
uint8_t i = len;
bool lsbit = false;
/*
* Pass 1, msb-to-lsb: divide bin[] by 2, and return the
* value mod 10 (the new value mod 5, combined with the
* previous lsbit).
*/
digit = 255; /* Could also be zero */
If 0 is just as fine, use 0. I am getting better code size and less
stack usage (using -mcall-prologues).
#if __AVR_ARCH__
/*
* The one implementation hassle is the need for two
* carry bits. One for the shift, and the other for
* the end-around carries modulo 255.
*
* Unfamiliar AVR inline asm feature: given a memory
* constraint, %2 prints X, Y or Z. %E2 and %F2 are the
* low and high registers of the pointer, respectively.
* So if %2 = Z, then %E2 = r30 and %F2 = r31.
*/
asm(""
"\n1: ld __tmp_reg__,-%4"
"\n lsr %1" /* lsbit to carry bit */
"\n ror __tmp_reg__"
"\n st %4,__tmp_reg__"
"\n rol %1" /* carry bit to lsbit */
"\n add %0,__tmp_reg__"
"\n adc %0,__zero_reg__" /* End-around carry */
"\n dec %2"
"\n brne 1b"
"\n.ifnc %A3,%E4"
"\n .error \"GCC constraints not working as expected.\""
"\n.endif"
"\n.ifnc %B3,%F4"
"\n .error \"GCC constraints not working as expected.\""
"\n.endif"
: "+&r" (digit), "+&r" (lsbit), "+&r" (i), "+e" (bin)
: "m" (*bin)
: "r0");
I don't quite get this, shouldn't this be something like
asm ("\n1: ld __tmp_reg__,-%a3"
"\n lsr %1" /* lsbit to carry bit */
"\n ror __tmp_reg__"
"\n st %a3,__tmp_reg__"
"\n rol %1" /* carry bit to lsbit */
"\n add %0,__tmp_reg__"
"\n adc %0,__zero_reg__" /* End-around carry */
"\n dec %2"
"\n brne 1b"
: "+r" (digit), "+r" (lsbit), "+r" (i), "+e" (bin)
:
: "memory");
IMO it's not worth the hassle to describe explicitly which portion of
memory gets changed, hence "memory".
Try to avoid "m" constraints in your code, except for making side
effects on memory explicit, i.e. don't use the respective %-Operand in
the asm template as "m" allows much more address modes than any AVR
instruction can chew.
if (digit > 4)
__builtin_unreachable();
What's the purpose of this test? An optimization hint?
#if __AVR_ARCH__
#define READ_PORT 0x38
#define WRITE_PORT 0x39
#define EXIT_PORT 0x3a
static uint8_t inline __attribute__((always_inline))
inb(uint8_t port)
{
uint8_t x;
asm("in %0,%1" : "=r" (x) : "I" (port));
return x;
}
static void inline __attribute__((always_inline))
outb(uint8_t port, uint8_t x)
{
asm("out %0,%1" :: "I" (port), "r" (x));
}
static void
putch(char c)
{
outb(WRITE_PORT, c);
}
static void
revwrite(char const *p, uint8_t len)
{
while (len--)
putch(*--p);
}
static void
revput(char *buf, uint8_t *bin, uint8_t len)
{
char const *p = genprint(buf, bin, len);
revwrite(p, p-buf);
putch('\n');
}
static void
putst(char const __flash *p)
{
char c;
while ((c = *p++) != 0)
putch(c);
putch('\n');
}
static void __attribute__((noreturn))
my_exit(uint8_t status)
{
outb(EXIT_PORT, status);
__builtin_unreachable();
}
FYI, avrtest comes with such functions for you, you just have to link
against, say, exit-atmega128.o
https://sourceforge.net/p/winavr/code/HEAD/tree/trunk/avrtest/
avrtest is just a core simulator, i.e. has reduced functionality
compared to other simulators -- but it's /fast/ and well suited for such
algorithmic torture tests.
Johann
_______________________________________________
AVR-libc-dev mailing list
AVR-libc-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-libc-dev