On 11/05/18 17:49, Freddie Chopin wrote:
> On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote:
>> For the Cortex-M devices (and probably many other RISC targets),
>> -fdata-sections comes at a big cost - it effectively blocks
>> -fsection-anchors and makes access to file-static data a lot bigger.
>> People often use -fdata-sections and -ffunction-sections along with
>> -Wl,--gc-sections with the aim of removing unused code and data (and
>> thus saving space, useful on small devices) - I would expect LTO
>> would
>> manage that anyway. The other purpose of these is to improve
>> locality
>> of reference - again LTO should do that for you. But even without
>> LTO,
>> I find the cost of -fdata-sections high compared to -fsection-
>> anchors.
>
> Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata-
> sections + --gc-sections useless.
>
> My test project compiled:
> - without LTO and without these attributes - 150824 B ROM + 4240 B RAM
> - with LTO and without these attributes - 133812 B ROM + 4208 B RAM
> - without LTO and with these attributes - 124456 B ROM + 3484 B RAM
> - with LTO and with these attributes - 120280 B ROM + 3680 B RAM
>
> As you see these attributes give much more than LTO in terms of size.
>
Interesting. Making these sections and then using gc-sections should
only remove code that is not used - LTO should do that anyway.
Have you tried with -ffunction-sections and not -fdata-sections? It is
the -fdata-sections that ruins -fsection-anchors - the
-ffunction-sections doesn't have the same kind of cost.
>
> As for the -fsection-anchors I guess this has no use for non-PIC code
> for arm-none-eabi. Whether I use it or not, the sizes are identical.
>
No, -fsection-anchors has plenty of use for fixed-position eabi code.
Take this little example code:
static int x;
static int y;
static int z;
void foo(void) {
int t = x;
x = y;
y = z;
z = t;
}
Compiled with gcc (4.8, as that's what I had convenient) with -O2
-mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically
with -O2, I believe), this gives:
21 foo:
22 @ args = 0, pretend = 0, frame = 0
23 @ frame_needed = 0, uses_anonymous_args = 0
24 @ link register save eliminated.
25 0000 034B ldr r3, .L2
26 0002 93E80500 ldmia r3, {r0, r2}
27 0006 9968 ldr r1, [r3, #8]
28 0008 1A60 str r2, [r3]
29 000a 9860 str r0, [r3, #8]
30 000c 5960 str r1, [r3, #4]
31 000e 7047 bx lr
32 .L3:
33 .align 2
34 .L2:
35 0010 00000000 .word .LANCHOR0
37 .bss
38 .align 2
39 .set .LANCHOR0,. + 0
42 x:
43 0000 00000000 .space 4
46 y:
47 0004 00000000 .space 4
50 z:
51 0008 00000000 .space 4
With -fdata-sections, I get:
21 foo:
22 @ args = 0, pretend = 0, frame = 0
23 @ frame_needed = 0, uses_anonymous_args = 0
24 @ link register save eliminated.
25 0000 30B4 push {r4, r5}
26 0002 0549 ldr r1, .L2
27 0004 054B ldr r3, .L2+4
28 0006 064A ldr r2, .L2+8
29 0008 0D68 ldr r5, [r1]
30 000a 1468 ldr r4, [r2]
31 000c 1868 ldr r0, [r3]
32 000e 1560 str r5, [r2]
33 0010 1C60 str r4, [r3]
34 0012 0860 str r0, [r1]
35 0014 30BC pop {r4, r5}
36 0016 7047 bx lr
37 .L3:
38 .align 2
39 .L2:
40 0018 00000000 .word .LANCHOR0
41 001c 00000000 .word .LANCHOR1
42 0020 00000000 .word .LANCHOR2
44 .section .bss.x,"aw",%nobits
45 .align 2
46 .set .LANCHOR0,. + 0
49 x:
50 0000 00000000 .space 4
51 .section .bss.y,"aw",%nobits
52 .align 2
53 .set .LANCHOR1,. + 0
56 y:
57 0000 00000000 .space 4
58 .section .bss.z,"aw",%nobits
59 .align 2
60 .set .LANCHOR2,. + 0
63 z:
64 0000 00000000 .space 4
The code is clearly bigger and slower, and uses more anchors in the code
section.
Note that to get similar improvements with non-static data, you need
"-fno-common" - a flag that I believe should be the default for the
compiler.