https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81541

            Bug ID: 81541
           Summary: Potential size optimisation: reusing common function
                    endings
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jpakkane at gmail dot com
  Target Milestone: ---

In embedded development code size optimizations are very important and there
are some optimizations that commercial compilers do but GCC does not seem to.
As an example compiling the following code:

int func5() {
  int i = 0;
  i+=func2();
  i+=func3();
  i+=func4();
  return i+func1();
}

int func6() {
  int i=1;
  i+=func3();
  i+=func3();
  i+=func4();
  return i+func1();
}

On ARM using -Os produces the following assembly:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr

Note how the ends of both of these functions have identical endings. This could
be converted to the following:

func5():
        push    {r4, lr}
        bl      func2()
        mov     r4, r0
common_tail:
        bl      func3()
        add     r4, r4, r0
        bl      func4()
        add     r4, r4, r0
        bl      func1()
        add     r0, r4, r0
        pop     {r4, lr}
        bx      lr
func6():
        push    {r4, lr}
        bl      func3()
        add     r4, r0, #1
        b common_tail

which has smaller code size.

I did a simple experiment and based on that this might not be all that useful
on x86:
http://nibblestew.blogspot.com/2017/07/experiment-binary-size-reduction-by.html

On ARM and similar platforms this would probably be beneficial, especially
given that commercial compilers perform this optimization.

Reply via email to