https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86436
Bug ID: 86436 Summary: IPA-ICF: miissed optimization at class template member functions Product: gcc Version: 8.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: petschy at gmail dot com Target Milestone: --- Created attachment 44363 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44363&action=edit test case The attached source has two class templates: NonFoldable<int> and Foldable<int>. The nonfoldable version uses the int template param in Bar() (a leaf function in the call graph) to do a computation, so different template params result in different Bar()'s, so they can't be folded, and because of that neither its callers. So far so good. Foldable<int> is very similar, but puts the template param into a const member variable in the ctor and Bar() uses that member. So now the value is different, but, the code is exacly the same. Some asm dump of x86-64 code: 7.3.1 (9bcef54ae6c7df97b276a7fa8da4c90d2452333c): Dump of assembler code for function nf_foo_0(NonFoldable<0>&, int): 0x00000000004004d0 <+0>: mov %esi,%edi 0x00000000004004d2 <+2>: jmp 0x4004a0 <NonFoldable<0>::Foo(int)> Dump of assembler code for function nf_bar_0(NonFoldable<0>&, int): 0x00000000004004e0 <+0>: mov %esi,%edi 0x00000000004004e2 <+2>: jmp 0x400490 <NonFoldable<0>::Bar(int)> Dump of assembler code for function NonFoldable<0>::Foo(int): 0x00000000004004a0 <+0>: jmp 0x400490 <NonFoldable<0>::Bar(int)> Dump of assembler code for function NonFoldable<0>::Bar(int): 0x0000000000400490 <+0>: mov %edi,%eax 0x0000000000400492 <+2>: retq Dump of assembler code for function nf_foo_42(NonFoldable<42>&, int): 0x00000000004004f0 <+0>: mov %esi,%edi 0x00000000004004f2 <+2>: jmp 0x4004c0 <NonFoldable<42>::Foo(int)> Dump of assembler code for function nf_bar_42(NonFoldable<42>&, int): 0x0000000000400500 <+0>: mov %esi,%edi 0x0000000000400502 <+2>: jmp 0x4004b0 <NonFoldable<42>::Bar(int)> Dump of assembler code for function NonFoldable<42>::Foo(int): 0x00000000004004c0 <+0>: jmp 0x4004b0 <NonFoldable<42>::Bar(int)> Dump of assembler code for function NonFoldable<42>::Bar(int): 0x00000000004004b0 <+0>: lea 0x2a(%rdi),%eax 0x00000000004004b3 <+3>: retq Dump of assembler code for function f_foo_0(Foldable<0>&, int): 0x0000000000400510 <+0>: jmpq 0x400560 <Foldable<0>::Foo(int)> Dump of assembler code for function f_bar_0(Foldable<0>&, int): 0x0000000000400520 <+0>: jmpq 0x400550 <Foldable<0>::Bar(int)> Dump of assembler code for function Foldable<0>::Foo(int): 0x0000000000400560 <+0>: jmpq 0x400550 <Foldable<0>::Bar(int)> Dump of assembler code for function Foldable<0>::Bar(int): 0x0000000000400550 <+0>: mov (%rdi),%eax 0x0000000000400552 <+2>: add %esi,%eax 0x0000000000400554 <+4>: retq Dump of assembler code for function f_foo_42(Foldable<42>&, int): 0x0000000000400530 <+0>: jmpq 0x400580 <Foldable<42>::Foo(int)> Dump of assembler code for function f_bar_42(Foldable<42>&, int): 0x0000000000400540 <+0>: jmpq 0x400570 <Foldable<42>::Bar(int)> Dump of assembler code for function Foldable<42>::Foo(int): 0x0000000000400580 <+0>: jmpq 0x400570 <Foldable<42>::Bar(int)> Dump of assembler code for function Foldable<42>::Bar(int): 0x0000000000400570 <+0>: mov (%rdi),%eax 0x0000000000400572 <+2>: add %esi,%eax 0x0000000000400574 <+4>: retq Under 7.3.1 no identical code folding happens at all. 8.1.1 & 9.0.0 only folds Foldable<0>::Bar with Foldable<42>::Bar(). Foo(), and the free standing functions calling these members are not recognized as foldable. I haven't thought really hard about the folding rules. Checking each fn against each fn globally by default is probably waaay too much work. However, instantiations of class template members are easy candidates. Or, on a wider scale, functions where the number and type of the args and the return type is the same, OR at least the same size. The scope should be configurable, eg compilation unit or shared lib / executable (LTO). My quick (and probably incomplete) rules would be: - if the functions are leaf OR call the same or foldable functions only - the accessed global variables (incl static members) are the same - the accessed member variables' offsets are the same, and the types are the same, OR at least they have the same size and the computations have the same results bitwise. Eg signed vs unsigned ints of the same size: reading a member variable and passing it to a foldable fn. Performing some computation, eg addition, but not doing anything that would make a difference, like checking the sign of the result, etc. This means that eg member functions of totally unrelated classes could be folded, if the accessed members are the same type or similar enough in relation to the operations performed. Bug #80277 is also about ICF, but there are only free standing functions. Used platform was Debian Stretch AMD64, the GCC versions tested: $ g++-7.3.1 -v Using built-in specs. COLLECT_GCC=g++-7.3.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/7.3.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-7.3.1 --disable-bootstrap CFLAGS='-Ofast -march=native -mtune=native' CXXFLAGS='-Ofast -march=native -mtune=native' CC=gcc-7.3.1 CXX=g++-7.3.1 Thread model: posix gcc version 7.3.1 20180708 (GCC) $ g++-8.1.1 -v Using built-in specs. COLLECT_GCC=g++-8.1.1 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/8.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-8.1.1 --disable-bootstrap CFLAGS='-O2 -march=native -mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' CC=gcc-7.3.1 CXX=g++-7.3.1 Thread model: posix gcc version 8.1.1 20180708 (GCC) $ g++-9.0.0 -v Using built-in specs. COLLECT_GCC=g++-9.0.0 COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-pc-linux-gnu/9.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --enable-languages=c,c++ --disable-multilib --program-suffix=-9.0.0 --disable-bootstrap CFLAGS='-O2 -march=native -mtune=native' CXXFLAGS='-O2 -march=native -mtune=native' CC=gcc-7.3.1 CXX=g++-7.3.1 Thread model: posix gcc version 9.0.0 20180708 (experimental) (GCC)