Re: [avr-gcc-list] [avr-libc]: Smarter parity implementation

Georg-Johann Lay Mon, 01 Aug 2011 06:11:23 -0700

Paulo Marques wrote:
> Georg-Johann Lay wrote:
>> The current implementation of parity in avr-libc compiles
>>
>> #include <stdint.h>
>> #include <util/parity.h>
>>
>> uint8_t pari1 (uint8_t val)
>> {
>>     return parity_even_bit (val);
>> }
>>
>> with -Os to
>>
>> pari1:
>> /* #APP */
>>     mov __tmp_reg__,r24
>>     swap r24
>>     eor r24,__tmp_reg__
>>     mov __tmp_reg__,r24
>>     lsr r24
>>     lsr r24
>>     eor r24,__tmp_reg__
>> /* #NOAPP */
>>     ldi r25,lo8(0)
>>     adiw r24,1
>>     asr r25
>>     ror r24
>>     andi r24,lo8(1)
>>     ret
>>
>> which are 12 instructions, 13 ticks and 2 regs.
>>
>> Parity can be implemented with 9 instructions
>> and 9 ticks and one reg. Instead of r24 any d-reg
>> can be used:
>>
>> ;; r24 = parity8 (r24)
>> ;; clobbers: __tmp_reg__
>> parity8:
>>     ;; parity is in r24[0..7]
>>     mov  __tmp_reg__, r24
>>     swap __tmp_reg__
>>     eor  r24, __tmp_reg__
>>     ;; parity is in r24[0..3]
>>     subi r24, -4
>>     andi r24, -5
>>     subi r24, -6
>>     ;; parity is in r24[0,3]
>>     sbrc r24, 3
>>     inc  r24
>>     ;; parity is in r24[0]
>>     andi r24, 1
>>
>> An implementation similar to the original avr-libc
>> using one instruction/tick more is
>>
>>     mov __tmp_reg__,r24
>>     swap r24
>>     eor r24,__tmp_reg__
>>     mov __tmp_reg__,r24
>>     lsr r24
>>     lsr r24
>>     eor r24,__tmp_reg__
>>     lsr r24
>>     sbci r24,0
>>     andi r24,1
> 
> IIRC you're required to clear r25 even if the return type is uint8_t,
> i.e., the return type needs to be upgraded to "int".
> 
> I think this happens because, since most expressions are upgraded to int
> anyway, it is cheaper to do it in one place than in all call sites.


The current parity_even_bit is not a function, it's just a macro that
expands as

#define parity_even_bit(val)                            \
(__extension__({                                        \
        unsigned char __t;                              \
        __asm__ (                                       \
                "mov __tmp_reg__,%0" "\n\t"             \
                "swap %0" "\n\t"                        \
                "eor %0,__tmp_reg__" "\n\t"             \
                "mov __tmp_reg__,%0" "\n\t"             \
                "lsr %0" "\n\t"                         \
                "lsr %0" "\n\t"                         \
                "eor %0,__tmp_reg__"                    \
                : "=r" (__t)                            \
                : "0" ((unsigned char)(val))            \
                : "r0"                                  \
        );                                              \
        (((__t + 1) >> 1) & 1);                         \
 }))

the promotion is triggered by the last line, but there is
no need for a premature promotion; the last line could be

   ((unsigned char) (__t & 1))

or
   __t;

depending on what the asm does.

Doing the AND outside asm might be smart because tests like

  if (parity_even_bit (x))

get parts or condition code like Z-flag from AND.


Anyway, even if parity_even_bit was a function
   unsigned char parity_even_bit (unsigned car);
there was no need to promote the return value.

The current text of the AVR ABI states that 8-bit return
values are always promoted to int.

However, avr-gcc does not behave that way since 4.3+ and
in earlier version the promotion was just because of an
optimization flaw; there is/was no code in GCC's avr
backend that describes/described such a promotion.

There was a discussion between some avr maintainers some
time ago that addressed that issue.  The outcome was to
update the ABI instead of forcing avr-gcc to promote.
Updating the ABI will allow for better code generation,
in particular in avr-libc's assembler parts.
The only case when such an ABI change would trigger problems
is when assembler code calls a C-function returning an 8-bit
value and expects the return value to be promoted.

As avr-gcc actually does not promote since 4.3+ and there
is not a single bug report in that direction (at least non
I am aware of), I think it's very much preferred to adapt
the ABI to avr-gcc's behavior than the other way round.

BTW, avr-gcc 4.7 comes with optimized versions of some buitins,
e.g. parity:

unsigned char my_parity (unsigned char a, unsigned char b)
{
    return __builtin_parity (a) + b;
}

which translates to

my_parity:
        rcall __parityqi2
        add r24,r22
        ret

Notice that the compiler knows the register footprint, i.e.
it knows that b (passed in r22) will survive the call.
__parityqi2 uses the algorithm from above so that it might
be desirable to use __builtin_parity when optimizing for size.

Johann


_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@nongnu.org
https://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: [avr-gcc-list] [avr-libc]: Smarter parity implementation

Reply via email to