Hi,
I've been playing around gcc -flto flag and inlining functionnalities for a
while in search of both optimized performance and full understanding of g++
behavious.
Right now, I'm puzzled by the assembly output produced for that piece of code:
#include <iostream>
using namespace std;
class A
{
public:
inline virtual void blah()
{
cout << "A" << endl;
}
};
class B : public A
{
public:
inline virtual void blah()
{
cout << "B" << endl;
}
};
class C
{
public:
void blah()
{
cout << "C" << endl;
}
};
int main(int argc, char** argv)
{
A* ptr = 0;
if(argc == 1)
ptr = new B();
else
ptr = new A();
ptr->blah();
B().blah();
C().blah();
}
I would expect the compiler to be able to inline function blah() when it is
statically called for class B and C but have a VTable resolution for the call
ptr->blah. Here's the relevant assembly code produced by g++ with flags -O3 and
-S:
main:
.LFB976:
.cfi_startproc
subq $24, %rsp
.cfi_def_cfa_offset 32
cmpl $1, %edi
movl $8, %edi
je .L18
call _Znwm
movq %rax, %rdi
movq $_ZTV1A+16, (%rax)
movl $_ZTV1A+16, %eax
.L16:
call *(%rax)
movq %rsp, %rdi
movq $_ZTV1B+16, (%rsp)
call _ZN1B4blahEv
movl $.LC2, %esi
movl $_ZSt4cout, %edi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
movq %rax, %rdi
call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
xorl %eax, %eax
addq $24, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L18:
.cfi_restore_state
call _Znwm
movq %rax, %rdi
movq $_ZTV1B+16, (%rax)
movl $_ZTV1B+16, %eax
jmp .L16
.cfi_endproc
The puzzling part is to find that the call for C().blah() is indeed inlined and
the ptr->blah() uses a VTable resolution, but the code for B.blah() uses
neither: the static adress is resolved but the code is not inlined! (The same
behaviour occurs if there would be a static-typed pointer to an object of class
B). I understand the compiler propagates the types properly, but even after
determining the correct type for the object of type B, it only resolves the
vtable reference (hence no call *(%..x) ), but cannot perform the inlining.
Question: why ? Can someone explain me the exact order in which the optimization
of g++ are performed and how they interact with each other ? I know this might
be tricky but any small shed of light could be helpfull. Also, did I miss a
flag which would enable g++ to proceed to do the inlining after the resolution
?
>From a practical point of view, I understand this example does not justify by
itself the absolute need for inlining. However, I do have a time-critical
application that would get 25-30% increase in speed if I could solve this
issue. Also, I'm just curious to understand why is this the behaviour of g++
(or if it's actually a bug) because it counter my most primitive intuition and
the beliefs of many people I know.
Thanks in advance for any answer to come.
Kind Regards
--
Thierry Lavoie, B.Ing., M.scA.
PhD. Student, Polytechnique Montreal
Lecturer INF2010: Data Structures and Algorithm
Lecturer LOG3210: Languages and Compilers