Re: [avr-libc-dev] Re: eeprom_read_byte and clr ret_hi

Wouter van Gulik Tue, 24 Nov 2009 08:15:35 -0800

David Brown schreef:

Wouter van Gulik wrote:
David Brown schreef:
Wouter van Gulik wrote:
David Brown schreef:
Weddington, Eric wrote:
-----Original Message----- From:avr-libc-dev-bounces+eric.weddington=atmel....@nongnu.org[mailto:avr-libc-dev-bounces+eric.weddington=atmel....@nongnu. org]
On Behalf Of Dmitry K. Sent: Sunday, November 22, 2009 12:21 AM To:
avr-libc-dev@nongnu.org Subject: Re: [avr-libc-dev]
eeprom_read_byte and clr ret_hi
eeprom_read_byte returns a uint8_t. Why does it clear r25?eerd_byte.S: clr ret_hi
Does the AVR ABI require that r25 be zeroed in a function
returning a
single byte? If not, this instruction could be removed.
This is a misty point. Look an example:

unsigned char foo1 (unsigned char *p) { return *p; }

extern unsigned char ext2 (void); int foo2 (void) { return ext2() +
1; }

Old Avr-gcc (3.3 - 4.2) are clear R25 in both cases: foo1() and
foo2().  The new Avr-gcc (4.3.3 and 4.4.2) are not clear R25 in
foo1().

Note, the function return value is present only in expression. So
it is promoted to integer. So it would be better to clear R25 in
foo1() only (at one place).
I agree that that is the way it should be.
I'm a little confused by this - I hope this is not implying areturn to avr-gcc 4.2 R25 clearing? With avr-gcc 4.2, foo1() abovewould clear R25 even though it is returning an 8-bit value. Thishas always been a waste of time and space - caller functions makeno use of the cleared R25, and thus clear it again themselves (suchas in foo2() after calling ext2()). With avr-gcc 4.3, the extraclear R25 instructions are omitted for functions returning 8-bitvalues. This is the way it should be (unless the C standardsdisagree...), IMHO. But it looks a little like you want foo1() toclear R25 here?
Incidentally, avr-gcc 4.2.2 actually produces better code forfoo2() than avr-gcc 4.3.2.
With 4.2.2, the code is:
foo2:
    call ext2
    ldi R25, lo8(0)
    adiw r24,1
    ret

With 4.3.2 (and 4.3.0), we get:
foo2:
    call ext2
    mov r18, r24
    ldi r19, lo8(0)
    subi r18, lo8(-(1))
    sbci r19, hi8(-(1))
    movw r24, r18
    ret
Is this regression is news to you, I can take it up in the mainavr-gcc mailing list and/or a missed optimisation bug report.
This is "normal" GCC behaviour, caused by internal promotion to inttypes. Having the clr r25 would have saved the ldi r19. Altoughthese type of missed optimization are known for long, it(apparently) is difficult to fix them. IIRC the main problem is thecarry bit propagation. I still don't now why exactly that is such abig problem, but then again I am no gcc expert.
I understand about int promotion, and how it's a PITA to remove allthe extra code that the avr backend has to put in because the gccfrontend has promoted the 8-bit data to 16-bit ints. However,avr-gcc 4.2 /does/ do a clr R25 in the function returning an 8-bitresult, and then it clears it /again/ after calling it (as in foo2above). That is definitely a waste, and it has been removed in 4.3+.
The problem I have with the 4.3.2 code above is not the zeroing ofthe upper byte - that's standard int promotion, and is required forcorrect code. The problem is that the result of ext2() is firstmoved into a new register pair and then promoted to an int - anunnecessary move, which forces the subi/sbci pair instead of subw,and requires an additional move before the ret instruction. In morecomplex code, the wasted register resources may also be relevant.
You're right, I was not grasping the real problem. However using-fno-split-wide-types makes 4.3.2 behave like 4.2.2. Split wide typesseems to be the problem here.
You are correct - I hadn't thought of that. However,-fno-split-wide-types is a workaround, rather than a solution. Ideally,the good code should be produced regardless of that flag, sincesplit-wide-types is enabled implicitly by all -Ox flags. Thesplit-wide-types is also useful to improve some code sequences, such aswhen you have 32-bit data but only want to look at part of it.

Yes of course it is a workaround, it might be a hint on where theproblem is. I did some simple experiments and it seems that split widetype only enforces a copy to a new register pair, which makes it lessefficient.


HTH,

Wouter


_______________________________________________
AVR-libc-dev mailing list
AVR-libc-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/avr-libc-dev

Re: [avr-libc-dev] Re: eeprom_read_byte and clr ret_hi

Reply via email to