Re: LTO vs GCC 8

David Brown Mon, 14 May 2018 07:35:01 -0700

On 11/05/18 17:49, Freddie Chopin wrote:
> On Fri, 2018-05-11 at 13:06 +0200, David Brown wrote:
>> For the Cortex-M devices (and probably many other RISC targets),
>> -fdata-sections comes at a big cost - it effectively blocks
>> -fsection-anchors and makes access to file-static data a lot bigger.
>> People often use -fdata-sections and -ffunction-sections along with
>> -Wl,--gc-sections with the aim of removing unused code and data (and
>> thus saving space, useful on small devices) - I would expect LTO
>> would
>> manage that anyway.  The other purpose of these is to improve
>> locality
>> of reference - again LTO should do that for you.  But even without
>> LTO,
>> I find the cost of -fdata-sections high compared to -fsection-
>> anchors.
> 
> Unfortunatelly having LTO doesn't make -ffunction-sections + -fdata-
> sections + --gc-sections useless.
> 
> My test project compiled:
> - without LTO and without these attributes - 150824 B ROM + 4240 B RAM
> - with LTO and without these attributes - 133812 B ROM + 4208 B RAM
> - without LTO and with these attributes - 124456 B ROM + 3484 B RAM
> - with LTO and with these attributes - 120280 B ROM + 3680 B RAM
> 
> As you see these attributes give much more than LTO in terms of size.
>


Interesting.  Making these sections and then using gc-sections should
only remove code that is not used - LTO should do that anyway.

Have you tried with -ffunction-sections and not -fdata-sections?  It is
the -fdata-sections that ruins -fsection-anchors - the
-ffunction-sections doesn't have the same kind of cost.

>
> As for the -fsection-anchors I guess this has no use for non-PIC code
> for arm-none-eabi. Whether I use it or not, the sizes are identical.
> 

No, -fsection-anchors has plenty of use for fixed-position eabi code.

Take this little example code:

static int x;
static int y;
static int z;

void foo(void) {
        int t = x;
        x = y;
        y = z;
        z = t;
}

Compiled with gcc (4.8, as that's what I had convenient) with -O2
-mcpu=cortex-m4 -mthumb and -fsection-anchors (enabled automatically
with -O2, I believe), this gives:

  21                    foo:
  22                            @ args = 0, pretend = 0, frame = 0
  23                            @ frame_needed = 0, uses_anonymous_args = 0
  24                            @ link register save eliminated.
  25 0000 034B                  ldr     r3, .L2
  26 0002 93E80500              ldmia   r3, {r0, r2}
  27 0006 9968                  ldr     r1, [r3, #8]
  28 0008 1A60                  str     r2, [r3]
  29 000a 9860                  str     r0, [r3, #8]
  30 000c 5960                  str     r1, [r3, #4]
  31 000e 7047                  bx      lr
  32                    .L3:
  33                            .align  2
  34                    .L2:
  35 0010 00000000              .word   .LANCHOR0
  37                            .bss
  38                            .align  2
  39                            .set    .LANCHOR0,. + 0
  42                    x:
  43 0000 00000000              .space  4
  46                    y:
  47 0004 00000000              .space  4
  50                    z:
  51 0008 00000000              .space  4


With -fdata-sections, I get:

  21                    foo:
  22                            @ args = 0, pretend = 0, frame = 0
  23                            @ frame_needed = 0, uses_anonymous_args = 0
  24                            @ link register save eliminated.
  25 0000 30B4                  push    {r4, r5}
  26 0002 0549                  ldr     r1, .L2
  27 0004 054B                  ldr     r3, .L2+4
  28 0006 064A                  ldr     r2, .L2+8
  29 0008 0D68                  ldr     r5, [r1]
  30 000a 1468                  ldr     r4, [r2]
  31 000c 1868                  ldr     r0, [r3]
  32 000e 1560                  str     r5, [r2]
  33 0010 1C60                  str     r4, [r3]
  34 0012 0860                  str     r0, [r1]
  35 0014 30BC                  pop     {r4, r5}
  36 0016 7047                  bx      lr
  37                    .L3:
  38                            .align  2
  39                    .L2:
  40 0018 00000000              .word   .LANCHOR0
  41 001c 00000000              .word   .LANCHOR1
  42 0020 00000000              .word   .LANCHOR2
  44                            .section        .bss.x,"aw",%nobits
  45                            .align  2
  46                            .set    .LANCHOR0,. + 0
  49                    x:
  50 0000 00000000              .space  4
  51                            .section        .bss.y,"aw",%nobits
  52                            .align  2
  53                            .set    .LANCHOR1,. + 0
  56                    y:
  57 0000 00000000              .space  4
  58                            .section        .bss.z,"aw",%nobits
  59                            .align  2
  60                            .set    .LANCHOR2,. + 0
  63                    z:
  64 0000 00000000              .space  4


The code is clearly bigger and slower, and uses more anchors in the code
section.


Note that to get similar improvements with non-static data, you need
"-fno-common" - a flag that I believe should be the default for the
compiler.

Re: LTO vs GCC 8

Reply via email to