https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124475
Bug ID: 124475
Summary: Missed devirtualization: single-implementation vtable
slot not resolved when pointer loaded from aggregate
Product: gcc
Version: 15.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: thammisettytarun at gmail dot com
Target Milestone: ---
Severity: enhancement
Target: x86_64-linux-gnu
GCC does not devirtualize calls through base-class pointers loaded from structs
or containers, even when the vtable slot has exactly one implementation
program-wide.
GCC devirtualizes when the pointer is assigned directly (type visible via
dataflow), but loses type information after a store/load through any aggregate.
The information needed is already available during LTO WPA: the type
inheritance graph shows a single candidate for the slot, so pointer provenance
should not matter.
The derived class is final, -flto provides whole-program visibility,
-fwhole-program ensures main is the only entry point, and -fvisibility=hidden
prevents any symbol from being interposed or accessed externally. Under these
conditions, the compiler can prove no other implementation of the vtable slot
exists.
Clang solves this with WholeProgramDevirt (WPD): during LTO, it enumerates
implementations per vtable slot from type metadata and unconditionally replaces
indirect calls at single-implementation slots with direct calls, regardless of
how the pointer was obtained.
Reproducer:
struct Base {
virtual ~Base() = default;
virtual int f(int x) = 0;
};
struct Derived final : Base {
int val = 0;
int f(int x) override { val += x; return val; }
};
struct Holder { Base* ptr; };
__attribute__((noinline))
int call_direct(Base* p, int x) {
return p->f(x);
}
__attribute__((noinline))
int call_from_member(Holder& h, int x) {
return h.ptr->f(x);
}
int main() {
Derived d;
Holder h{&d};
return call_direct(&d, 1) + call_from_member(h, 1);
}
Compile:
g++-15 -O3 -flto=auto -fno-fat-lto-objects -fwhole-program \
-fvisibility=hidden -fdevirtualize -fdevirtualize-speculatively \
-fdevirtualize-at-ltrans -o devirt_test devirt_test.cpp
Result — call_direct is fully devirtualized and inlined:
call_direct(Base*, int) [clone .constprop.0]:
movl $0x1,0x8(%rdi)
mov $0x1,%eax
ret
call_from_member emits an indirect call through the vtable:
call_from_member(Holder&, int) [clone .constprop.0] [clone .isra.0]:
mov (%rdi),%rax # load Base* from Holder
mov $0x1,%esi
mov 0x10(%rax),%rax # load vtable slot
jmp *%rax # INDIRECT — not devirtualized
Clang (with -flto=thin -fwhole-program-vtables -fvisibility=hidden)
devirtualizes and inlines both:
call_direct(Base*, int):
mov 0x8(%rdi),%eax
inc %eax
mov %eax,0x8(%rdi)
ret
call_from_member(Holder&, int):
mov 0x8(%rdi),%eax
inc %eax
mov %eax,0x8(%rdi)
ret
Only one user-defined vtable exists in the linked binary:
$ nm -C devirt_test | grep 'vtable for'
0000000000402048 r vtable for Derived
The type inheritance graph during LTO WPA has complete information: Base::f is
pure virtual, Derived is the only class that overrides it, Derived is final,
-flto gives the compiler whole-program visibility, and -fvisibility=hidden
ensures no symbol can be interposed externally. Every call through this vtable
slot must resolve to Derived::f regardless of pointer provenance.
Environment:
- GCC 15.1.0 (x86_64-pc-linux-gnu)
- Configured with: --enable-languages=c,c++,fortran,go --enable-lto
--enable-plugin --disable-multilib