https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

            Bug ID: 100593
           Summary: [ELF] -fno-pic: Use GOT to take address of an external
                    default visibility function
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: i at maskray dot me
  Target Milestone: ---

Most ELF targets use an absolute relocation (e.g. R_X86_64_32) to take the
address of a default visibility non-definition function declaration.
The absolute relocation can cause a canonical PLT entry (st_shndx=0,
st_value!=0; The term is a parlance within a few LLD developers, but not
broadly adopted).
If the defining DSO is linked with Bsymbolic-functions (or -Bsymbolic), the
addresses taken within the DSO and outside of the DSO will be different.
Since C++ requires uniqueness of the address, this violates the language
standard.

Outside of the GNU ELF world, many dynamic linking implementations have shifted
to a direct binding and non-interposition by default world.
We have rants from people complaining about shared object performance.
(e.g.
https://lore.kernel.org/lkml/CAHk-=whs8QZf3YnifdLv57+FhBi5_WeNTG1B-suOES=rcus...@mail.gmail.com/
"Re: Very slow clang kernel config .."
https://www.facebook.com/dan.colascione/posts/10107358290728348 "Python is 1.3x
faster when compiled in a way that re-examines shitty technical decisions from
the 1990s.")
I believe ld -Bsymbolic-functions can materialize most of the savings other
implementations provide, without introducing complex things to ELF.
However, since -Bsymbolic-functions doesn't play well with -fno-pic's canonical
PLT entries, we should fix -fno-pic.

Converting a direct access to a GOT access for a function symbol cannot be in a
performance critical path,
so let's just do it.
Static linking is happy, too - the linker can either optimize out the GOT
(x86-64 GOTPCRELX, PPC64 TOC) or prefill the GOT entry with
a constant.

Once -fno-pic has the sane behavior (GOT by default), more and more shared
objects can be optionally built with -Bsymbolic-functions -
if they don't intend to support interposition, while still being compatible
with -fno-pic executables.

How effective is -Bsymbolic-functions? As a data point, my x86_64 Linux kernel
defconfig build with -Bsymbolic-functions linked Clang is 15% faster.
(83% JUMP_SLOT relocations are eliminated!)

% cat a.c
extern void fun();
void *get() { return (void*)fun; }

% gcc -fno-pic -S a.c -O2 -o -
get:
.LFB0:
        .cfi_startproc
        movl    $fun, %eax
        ret
% aarch64-linux-gnu-gcc -fno-pic -S a.c -O2 -o -
...
        adrp    x0, fun
        add     x0, x0, :lo12:fun

# good, ppc64 elfv2 always uses TOC
% powerpc64le-linux-gnu-gcc -fno-pic -S a.c -O2 -o -
...
        addis 3,2,.LC0@toc@ha
        ld 3,.LC0@toc@l(3)

Reply via email to