https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86436

            Bug ID: 86436
           Summary: IPA-ICF: miissed optimization at class template member
                    functions
           Product: gcc
           Version: 8.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: petschy at gmail dot com
  Target Milestone: ---

Created attachment 44363
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44363&action=edit
test case

The attached source has two class templates: NonFoldable<int> and
Foldable<int>.

The nonfoldable version uses the int template param in Bar() (a leaf function
in the call graph) to do a computation, so different template params result in
different Bar()'s, so they can't be folded, and because of that neither its
callers. So far so good.

Foldable<int> is very similar, but puts the template param into a const member
variable in the ctor and Bar() uses that member. So now the value is different,
but, the code is exacly the same.

Some asm dump of x86-64 code:

7.3.1 (9bcef54ae6c7df97b276a7fa8da4c90d2452333c):
Dump of assembler code for function nf_foo_0(NonFoldable<0>&, int):
   0x00000000004004d0 <+0>:     mov    %esi,%edi
   0x00000000004004d2 <+2>:     jmp    0x4004a0 <NonFoldable<0>::Foo(int)>

Dump of assembler code for function nf_bar_0(NonFoldable<0>&, int):
   0x00000000004004e0 <+0>:     mov    %esi,%edi
   0x00000000004004e2 <+2>:     jmp    0x400490 <NonFoldable<0>::Bar(int)>

Dump of assembler code for function NonFoldable<0>::Foo(int):
   0x00000000004004a0 <+0>:     jmp    0x400490 <NonFoldable<0>::Bar(int)>

Dump of assembler code for function NonFoldable<0>::Bar(int):
   0x0000000000400490 <+0>:     mov    %edi,%eax
   0x0000000000400492 <+2>:     retq   


Dump of assembler code for function nf_foo_42(NonFoldable<42>&, int):
   0x00000000004004f0 <+0>:     mov    %esi,%edi
   0x00000000004004f2 <+2>:     jmp    0x4004c0 <NonFoldable<42>::Foo(int)>

Dump of assembler code for function nf_bar_42(NonFoldable<42>&, int):
   0x0000000000400500 <+0>:     mov    %esi,%edi
   0x0000000000400502 <+2>:     jmp    0x4004b0 <NonFoldable<42>::Bar(int)>

Dump of assembler code for function NonFoldable<42>::Foo(int):
   0x00000000004004c0 <+0>:     jmp    0x4004b0 <NonFoldable<42>::Bar(int)>

Dump of assembler code for function NonFoldable<42>::Bar(int):
   0x00000000004004b0 <+0>:     lea    0x2a(%rdi),%eax
   0x00000000004004b3 <+3>:     retq   


Dump of assembler code for function f_foo_0(Foldable<0>&, int):
   0x0000000000400510 <+0>:     jmpq   0x400560 <Foldable<0>::Foo(int)>

Dump of assembler code for function f_bar_0(Foldable<0>&, int):
   0x0000000000400520 <+0>:     jmpq   0x400550 <Foldable<0>::Bar(int)>

Dump of assembler code for function Foldable<0>::Foo(int):
   0x0000000000400560 <+0>:     jmpq   0x400550 <Foldable<0>::Bar(int)>

Dump of assembler code for function Foldable<0>::Bar(int):
   0x0000000000400550 <+0>:     mov    (%rdi),%eax
   0x0000000000400552 <+2>:     add    %esi,%eax
   0x0000000000400554 <+4>:     retq   


Dump of assembler code for function f_foo_42(Foldable<42>&, int):
   0x0000000000400530 <+0>:     jmpq   0x400580 <Foldable<42>::Foo(int)>

Dump of assembler code for function f_bar_42(Foldable<42>&, int):
   0x0000000000400540 <+0>:     jmpq   0x400570 <Foldable<42>::Bar(int)>

Dump of assembler code for function Foldable<42>::Foo(int):
   0x0000000000400580 <+0>:     jmpq   0x400570 <Foldable<42>::Bar(int)>

Dump of assembler code for function Foldable<42>::Bar(int):
   0x0000000000400570 <+0>:     mov    (%rdi),%eax
   0x0000000000400572 <+2>:     add    %esi,%eax
   0x0000000000400574 <+4>:     retq   

Under 7.3.1 no identical code folding happens at all. 8.1.1 & 9.0.0 only folds
Foldable<0>::Bar with Foldable<42>::Bar(). Foo(), and the free standing
functions calling these members are not recognized as foldable.

I haven't thought really hard about the folding rules. Checking each fn against
each fn globally by default is probably waaay too much work. However,
instantiations of class template members are easy candidates. Or, on a wider
scale, functions where the number and type of the args and the return type is
the same, OR at least the same size. The scope should be configurable, eg
compilation unit or shared lib / executable (LTO).

My quick (and probably incomplete) rules would be:
- if the functions are leaf OR call the same or foldable functions only
- the accessed global variables (incl static members) are the same
- the accessed member variables' offsets are the same, and the types are the
same, OR at least they have the same size and the computations have the same
results bitwise. Eg signed vs unsigned ints of the same size: reading a member
variable and passing it to a foldable fn. Performing some computation, eg
addition, but not doing anything that would make a difference, like checking
the sign of the result, etc.

This means that eg member functions of totally unrelated classes could be
folded, if the accessed members are the same type or similar enough in relation
to the operations performed.

Bug #80277 is also about ICF, but there are only free standing functions.

Used platform was Debian Stretch AMD64, the GCC versions tested:

$ g++-7.3.1 -v
Using built-in specs.
COLLECT_GCC=g++-7.3.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-7.3.1 --disable-bootstrap CFLAGS='-Ofast -march=native
-mtune=native' CXXFLAGS='-Ofast -march=native -mtune=native' CC=gcc-7.3.1
CXX=g++-7.3.1
Thread model: posix
gcc version 7.3.1 20180708 (GCC)

$ g++-8.1.1 -v
Using built-in specs.
COLLECT_GCC=g++-8.1.1
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-8.1.1 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' CC=gcc-7.3.1
CXX=g++-7.3.1
Thread model: posix
gcc version 8.1.1 20180708 (GCC)

$ g++-9.0.0 -v
Using built-in specs.
COLLECT_GCC=g++-9.0.0
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --enable-languages=c,c++ --disable-multilib
--program-suffix=-9.0.0 --disable-bootstrap CFLAGS='-O2 -march=native
-mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' CC=gcc-7.3.1
CXX=g++-7.3.1
Thread model: posix
gcc version 9.0.0 20180708 (experimental) (GCC)

Reply via email to