Hi,
I am comparing the assembly generated by compilers targeting arm-wince
platform and it seems
that cross-compiler from gcc-trunk is less optimized than an old one based
on gcc 4.1.x
Here is the comparison obtained from objdump:
cegcc-4.1.x :
00011000 <WinMainCRTStartup>:
11000: e92d40f0 push {r4, r5, r6, r7, lr}
11004: e1a04000 mov r4, r0
11008: e1a05001 mov r5, r1
1100c: e1a06002 mov r6, r2
11010: e1a07003 mov r7, r3
11014: eb0000de bl 11394 <_fpreset>
11018: eb00002a bl 110c8 <_pei386_runtime_relocator>
1101c: eb000099 bl 11288 <__atexit_init>
11020: eb0000d3 bl 11374 <__gccmain>
11024: e1a01005 mov r1, r5
11028: e1a00004 mov r0, r4
1102c: e1a02006 mov r2, r6
11030: e1a03007 mov r3, r7
11034: eb000005 bl 11050 <WinMain>
11038: e1a04000 mov r4, r0
1103c: eb000087 bl 11260 <_cexit>
11040: e1a01004 mov r1, r4
11044: e3a00042 mov r0, #66 ; 0x42
11048: eb0000d4 bl 113a0 <TerminateProcess>
1104c: eafffffe b 1104c <WinMainCRTStartup+0x4c>
cegcc-4.4.x
00011000 <WinMainCRTStartup>:
11000: e92d4010 push {r4, lr}
11004: e1a04000 mov r4, r0
11008: e24dd00c sub sp, sp, #12 ; 0xc
1100c: e58d1008 str r1, [sp, #8]
11010: e58d2004 str r2, [sp, #4]
11014: e58d3000 str r3, [sp]
11018: eb000120 bl 114a0 <_fpreset>
1101c: eb000043 bl 11130 <_pei386_runtime_relocator>
11020: eb0000ce bl 11360 <__atexit_init>
11024: eb000111 bl 11470 <__gccmain>
11028: e59d1008 ldr r1, [sp, #8]
1102c: e1a00004 mov r0, r4
11030: e59d2004 ldr r2, [sp, #4]
11034: e59d3000 ldr r3, [sp]
11038: eb000028 bl 110e0 <WinMain>
1103c: e1a04000 mov r4, r0
11040: eb0000ba bl 11330 <_cexit>
11044: e1a01004 mov r1, r4
11048: e3a00042 mov r0, #66 ; 0x42
1104c: eb000116 bl 114ac <TerminateProcess>
11050: eafffffe b 11050 <WinMainCRTStartup+0x50>
11054: e1a00000 nop (mov r0,r0)
11058: e1a00000 nop (mov r0,r0)
1105c: e1a00000 nop (mov r0,r0)
If you have a look at address 11008-1100c you can see that old gcc is using
registers
but upcoming gcc-4.4 is using memory.
I tried to put some optim flags -O2 but it doesn't modify the situation.
Is there anything to do to improve this situation ? Is it a normal behavior
?
Maybe my remark is not relevant because I didn't try to do some benchmark
and I agree
this is not because gcc-trunk do not optimize this specific part that it
will be slower.
I have also noticed that now I get some nop instructions and when I ask gcc
to generate
assembly I can see that alignement directive is different.
I used to have .align 0 with gcc-4.1 and now I get a .align 4, how can I
change that ?
And finally maybe those nop insn prevents compiler from optimizing ...
Thanks