https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118492
Bug ID: 118492 Summary: Move retrieval of virtual table pointers out of the loop Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: antoshkka at gmail dot com Target Milestone: --- Consider the sample code: struct Base { virtual void foo() = 0; }; void sample(Base& derived) { for (unsigned i = 0; i < 1000; ++i) { derived.foo(); } } With -O2 GCC generates the following assembly: sample(Base&): push rbp mov rbp, rdi push rbx mov ebx, 1000 sub rsp, 8 .L2: mov rax, QWORD PTR [rbp+0] mov rdi, rbp call [QWORD PTR [rax]] sub ebx, 1 jne .L2 add rsp, 8 pop rbx pop rbp ret Note that `rax, QWORD PTR [rbp+0]` is computed on each iteration, however the vptr should not change. A more optimal assembly may look like the following: sample(Base&): push rbp push r14 push rbx mov rbx, rdi mov rax, qword ptr [rdi] mov r14, qword ptr [rax] mov ebp, 1000 .LBB0_1: mov rdi, rbx call r14 dec ebp jne .LBB0_1 pop rbx pop r14 pop rbp ret Clang has such optimization under the -fstrict-vtable-pointers flag. Some iformation on that option is available at https://llvm.org/devmtg/2016-11/Slides/Padlewski-DevirtualizationInLLVM.pdf Godbolt playground: https://godbolt.org/z/s36qK5e8v