https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
Bug ID: 100593 Summary: [ELF] -fno-pic: Use GOT to take address of an external default visibility function Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Most ELF targets use an absolute relocation (e.g. R_X86_64_32) to take the address of a default visibility non-definition function declaration. The absolute relocation can cause a canonical PLT entry (st_shndx=0, st_value!=0; The term is a parlance within a few LLD developers, but not broadly adopted). If the defining DSO is linked with Bsymbolic-functions (or -Bsymbolic), the addresses taken within the DSO and outside of the DSO will be different. Since C++ requires uniqueness of the address, this violates the language standard. Outside of the GNU ELF world, many dynamic linking implementations have shifted to a direct binding and non-interposition by default world. We have rants from people complaining about shared object performance. (e.g. https://lore.kernel.org/lkml/CAHk-=whs8QZf3YnifdLv57+FhBi5_WeNTG1B-suOES=rcus...@mail.gmail.com/ "Re: Very slow clang kernel config .." https://www.facebook.com/dan.colascione/posts/10107358290728348 "Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") I believe ld -Bsymbolic-functions can materialize most of the savings other implementations provide, without introducing complex things to ELF. However, since -Bsymbolic-functions doesn't play well with -fno-pic's canonical PLT entries, we should fix -fno-pic. Converting a direct access to a GOT access for a function symbol cannot be in a performance critical path, so let's just do it. Static linking is happy, too - the linker can either optimize out the GOT (x86-64 GOTPCRELX, PPC64 TOC) or prefill the GOT entry with a constant. Once -fno-pic has the sane behavior (GOT by default), more and more shared objects can be optionally built with -Bsymbolic-functions - if they don't intend to support interposition, while still being compatible with -fno-pic executables. How effective is -Bsymbolic-functions? As a data point, my x86_64 Linux kernel defconfig build with -Bsymbolic-functions linked Clang is 15% faster. (83% JUMP_SLOT relocations are eliminated!) % cat a.c extern void fun(); void *get() { return (void*)fun; } % gcc -fno-pic -S a.c -O2 -o - get: .LFB0: .cfi_startproc movl $fun, %eax ret % aarch64-linux-gnu-gcc -fno-pic -S a.c -O2 -o - ... adrp x0, fun add x0, x0, :lo12:fun # good, ppc64 elfv2 always uses TOC % powerpc64le-linux-gnu-gcc -fno-pic -S a.c -O2 -o - ... addis 3,2,.LC0@toc@ha ld 3,.LC0@toc@l(3)