Hello

We've got a problem with -reduce-relocations. tl;dr: it's a broken concept and
we either add a permanent workaround or we stop using it. The permanent
workaround is to compile all executables in PIC/PIE mode.

Long story:
The -reduce-relocations option in configure checks that the compiler supports
the linker flag -Bsymbolic-functions. That function was added to binutils in
2006 from our urging, to make it possible for us to use it when the -Bsymbolic
option presented problems. Turns out that -Bsymbolic-functions has the same
problems that -Bsymbolic had and is no fix.

Those two options cause the linker to "symbolic link" some symbols into the
binary it's producing. That is, if a symbol X is used and is also defined
inside this ELF module, then this option tells the linker that it may rightly
assume that the symbol will always be inside this module. The linker will then
use cheaper types of relocation, or none at all. This is a huge performance
improvement both at load- and at run-time.

-Bsymbolic does it for everything, whereas -Bsymbolic-functions does it for
functions only.

The reason why we needed -Bsymbolic-functions in the first place is that ELF
has a weird feature that causes data variables to move between modules.
Functions weren't affected because they aren't moved.

Turns out that there is one situation in which a function is treated as data:
when you take its address. In order to compare equally, the dynamic linker
must resolve the function address to only one place, and unfortunately for us,
the choice isn't to our liking. The "canonical" address may be moved from the
library.

We haven't hit this problem before because we hadn't been doing function
pointer comparisons. Now, with Olivier's "new connection syntax" patch, we
are.

The workaround possible is to tell the compiler and linker that even
executables are position-independent. This causes the linker to stop using
copy/move relocations because it doesn't need them. However, there use of PIC
may have a non-trivial performance impact on applications, due to indirect
variable accesses and loss of one register.

Regardless of whether I manage to convince the linker people to improve the
situation, we need to figure out a solution for existing systems. What shall we
do?


Even longer story (background):

In code that isn't position-independent (i.e., the executable), a data access
is done as:
        movl    variable, %eax

And a function call as:
        call    function

And the loading of a function address as:
        movl    $function, %edi


When linking this program, the linker needs to write the address of the
variable "variable" and of the function "function" into the instructions (one
is absolute and the other relative, but that's irrelevant). If both symbols
are found in a shared library, then the linker will "patch up" differently.

For the function, it will make the "call" instruction call to a stub called
the Procedure Linkage Table (PLT), which then loads the proper address from
somewhere and then jumps to the proper address. That somewhere is another
structure called the Global Offset Table, which the dynamic linker will fill
with the actual function address once the library has been loaded.

For the variable, things get complicated. There's no way to do the PLT trick.
So what the linker does instead is add a "copy relocation". It writes the name
of the variable and its expected size and reserves that much in the
executable. The dynamic linker will then, at load time, find the variable in
the shared library, copy the contents and then tell the library it should
instead find the variable in the executable's memory.

When using position-independent code options (-fPIC and -fPIE), things change.
The compiler will write for the function call:
        call    function@PLT

The loading of a function address is:
        movq    function@GOTPCREL(%rip), %rdi

As for the variable, it produces:
        movq    variable@GOTPCREL(%rip), %rax
        movl    (%rax), %eax

All accesses are position-independent and indirect. The call is placed via the
PLT, addresses are loaded from the GOT and the loading of values is done after
the actual address is loaded from the GOT.

This is suitable for accessing symbols defined in other ELF modules. It's also
necessary for library code.

Unfortunately, the side-effect is that access to symbols defined in the current
ELF module is also done indirectly. Two options help change this: -
fvisibility=hidden and the symbolics.

The -fvisibility=hidden option is enabled by default in Qt since 4.0 and
corresponds to the configure option -reduce-exports. It does not change the
code above, so it means that all variable accesses to variables not defined in
the same compilation unit are indirect. Fortunately for the function call, the
linker realises that target is inside the library and cannot be anywhere else,
so the call is now direct to function. The loading of the address is via the
GOT, which means a run-time relocation is still necessary, when the most
efficient solution would be to use the "load effective address" instruction with
no relocation.

The -Bsymbolic and -Bsymbolic-functions produce the same effect, with the
difference that the symbol is left the ELF export table (i.e., "default"
visibility).

The consequences of all of this are:
 1) there's absolutely no way to get the most efficient code in libraries,
period. ELF is optimised for executable code, not library.
 2) -Bsymbolic is a broken concept so long as copy relocations remain in use
 3) -Bsymbolic-functions is either the same broken concept or a broken
implementation. It might be possible to salvage the option by making the
linker optimise the PLT calls like it does today, but keep the GOT references
as public.
 4) calling a function via a function pointer is inefficient because of an
indirect jump. If that function's address was taken in the executable, it's
doubly inefficient: the indirect jump you make resolves to another indirect
jump.

The only architecture not affected by this is IA-64. One reason is that IA-64
ABI mandates that executables also be PIC, so the original problem is gone:
there are no copy relocations. What's more, Intel engineers realised the
problem of the indirect loading of data and invented a special relocation that
the linker is allowed to relax into simpler code. If the symbol is found, at
link-time, to be on the same ELF module, the linker relaxes the "load"
generated by the compiler into a "move" between registers.

It's possible to apply the same lessons learned to other platforms, but it
hasn't been done.

--
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center
     Intel Sweden AB - Registration Number: 556189-6027
     Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development

Reply via email to