https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118492

            Bug ID: 118492
           Summary: Move retrieval of virtual table pointers out of the
                    loop
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: antoshkka at gmail dot com
  Target Milestone: ---

Consider the sample code:

struct Base {
    virtual void foo() = 0;
};

void sample(Base& derived) {
    for (unsigned i = 0; i < 1000; ++i) {
        derived.foo();
    }
}

With -O2 GCC generates the following assembly:

sample(Base&):
        push    rbp
        mov     rbp, rdi
        push    rbx
        mov     ebx, 1000
        sub     rsp, 8
.L2:
        mov     rax, QWORD PTR [rbp+0]
        mov     rdi, rbp
        call    [QWORD PTR [rax]]
        sub     ebx, 1
        jne     .L2
        add     rsp, 8
        pop     rbx
        pop     rbp
        ret

Note that `rax, QWORD PTR [rbp+0]` is computed on each iteration, however the
vptr should not change.

A more optimal assembly may look like the following:

sample(Base&):
        push    rbp
        push    r14
        push    rbx
        mov     rbx, rdi
        mov     rax, qword ptr [rdi]
        mov     r14, qword ptr [rax]
        mov     ebp, 1000
.LBB0_1:
        mov     rdi, rbx
        call    r14
        dec     ebp
        jne     .LBB0_1
        pop     rbx
        pop     r14
        pop     rbp
        ret

Clang has such optimization under the -fstrict-vtable-pointers flag. Some
iformation on that option is available at
https://llvm.org/devmtg/2016-11/Slides/Padlewski-DevirtualizationInLLVM.pdf

Godbolt playground: https://godbolt.org/z/s36qK5e8v

Reply via email to