https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124475

            Bug ID: 124475
           Summary: Missed devirtualization: single-implementation vtable
                    slot not resolved when pointer loaded from aggregate
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: thammisettytarun at gmail dot com
  Target Milestone: ---

Severity: enhancement

Target: x86_64-linux-gnu

GCC does not devirtualize calls through base-class pointers loaded from structs
or containers, even when the vtable slot has exactly one implementation
program-wide.

GCC devirtualizes when the pointer is assigned directly (type visible via
dataflow), but loses type information after a store/load through any aggregate.
The information needed is already available during LTO WPA: the type
inheritance graph shows a single candidate for the slot, so pointer provenance
should not matter.

The derived class is final, -flto provides whole-program visibility,
-fwhole-program ensures main is the only entry point, and -fvisibility=hidden
prevents any symbol from being interposed or accessed externally. Under these
conditions, the compiler can prove no other implementation of the vtable slot
exists.

Clang solves this with WholeProgramDevirt (WPD): during LTO, it enumerates
implementations per vtable slot from type metadata and unconditionally replaces
indirect calls at single-implementation slots with direct calls, regardless of
how the pointer was obtained.

Reproducer:

  struct Base {
    virtual ~Base() = default;
    virtual int f(int x) = 0;
  };

  struct Derived final : Base {
    int val = 0;
    int f(int x) override { val += x; return val; }
  };

  struct Holder { Base* ptr; };

  __attribute__((noinline))
  int call_direct(Base* p, int x) {
    return p->f(x);
  }

  __attribute__((noinline))
  int call_from_member(Holder& h, int x) {
    return h.ptr->f(x);
  }

  int main() {
    Derived d;
    Holder h{&d};
    return call_direct(&d, 1) + call_from_member(h, 1);
  }

Compile:
  g++-15 -O3 -flto=auto -fno-fat-lto-objects -fwhole-program \
    -fvisibility=hidden -fdevirtualize -fdevirtualize-speculatively \
    -fdevirtualize-at-ltrans -o devirt_test devirt_test.cpp

Result — call_direct is fully devirtualized and inlined:

  call_direct(Base*, int) [clone .constprop.0]:
    movl   $0x1,0x8(%rdi)
    mov    $0x1,%eax
    ret

call_from_member emits an indirect call through the vtable:

  call_from_member(Holder&, int) [clone .constprop.0] [clone .isra.0]:
    mov    (%rdi),%rax        # load Base* from Holder
    mov    $0x1,%esi
    mov    0x10(%rax),%rax    # load vtable slot
    jmp    *%rax              # INDIRECT — not devirtualized

Clang (with -flto=thin -fwhole-program-vtables -fvisibility=hidden)
devirtualizes and inlines both:

  call_direct(Base*, int):
    mov    0x8(%rdi),%eax
    inc    %eax
    mov    %eax,0x8(%rdi)
    ret

  call_from_member(Holder&, int):
    mov    0x8(%rdi),%eax
    inc    %eax
    mov    %eax,0x8(%rdi)
    ret

Only one user-defined vtable exists in the linked binary:

  $ nm -C devirt_test | grep 'vtable for'
  0000000000402048 r vtable for Derived

The type inheritance graph during LTO WPA has complete information: Base::f is
pure virtual, Derived is the only class that overrides it, Derived is final,
-flto gives the compiler whole-program visibility, and -fvisibility=hidden
ensures no symbol can be interposed externally. Every call through this vtable
slot must resolve to Derived::f regardless of pointer provenance.

Environment:
  - GCC 15.1.0 (x86_64-pc-linux-gnu)
  - Configured with: --enable-languages=c,c++,fortran,go --enable-lto
--enable-plugin --disable-multilib

Reply via email to