On 12 June 2018 at 20:33, Laszlo Ersek <ler...@redhat.com> wrote:
> Some super-naive questions, which are supposed to educate me, and not to
> question the series:
>
> On 06/12/18 17:22, Ard Biesheuvel wrote:
>> The GCC toolchain uses PIE mode when building code for X64, because it
>> is the most efficient in size: it uses relative references where
>> possible, but still uses 64-bit quantities for absolute symbol
>> references,
>
> Absolute symbol references such as? References to fixed (constant)
> addresses?
>

I should have been clearer here: from the GCC man page (apologies for
the whitespace soup)

"""
       -mcmodel=small
           Generate code for the small code model: the program and its
symbols must be linked in the lower 2 GB of the address space.
Pointers are 64 bits.  Programs can be statically or
           dynamically linked.  This is the default code model.

       -mcmodel=kernel
           Generate code for the kernel code model.  The kernel runs
in the negative 2 GB of the address space.  This model has to be used
for Linux kernel code.

       -mcmodel=medium
           Generate code for the medium model: the program is linked
in the lower 2 GB of the address space.  Small symbols are also placed
there.  Symbols with sizes larger than
           -mlarge-data-threshold are put into large data or BSS
sections and can be located above 2GB.  Programs can be statically or
dynamically linked.

       -mcmodel=large
           Generate code for the large model.  This model makes no
assumptions about addresses and sizes of sections.
"""

Formerly, we used the large model because UEFI can load PE/COFF
executables anywhere in the lower address space, not only in the first
2 GB. The small PIE model is the best fit for UEFI because it does not
have this limitation, but [unlike the large model] only uses absolute
references when necessary, and will use relative references when it
can. (I.e., it assumes the program will fit in 4 GB of memory, which
the large model does not)

Absolute symbol references are things like statically initialized
function pointer variables or other quantities whose value cannot be
obtained programmatically at runtime using a relative reference.

>> which is optimal for executables that need to be converted
>> to PE/COFF using GenFw.
>
> Why is that approach optimal? As few relocations records are required as
> possible?
>

Because GenFw translates ELF relocations into PE/COFF relocations, but
only for the subset that requires fixing up at runtime. Relative
references do not require such fixups, so a code model that minimizes
the number of absolute relocations is therefore optimal. Note that
absolute references typically require twice the space as well.

>> Enabling PIE mode has a couple of side effects though, primarily caused
>> by the fact that the primary application area of GCC is to build programs
>> for userland. GCC will assume that ELF symbols should be preemptible (which
>> makes sense for PIC but not for PIE,
>
> Why don't preemptible symbols make sense for PIE?
>
> For example, if a userspace program loads a plugin with dlopen(), and
> the plugin (.so) uses helper functions from the main executable, then
> the main executable has to be (well, had to be, earlier?) built with
> "-rdynamic". Wouldn't this mean the main executable could both be PIE
> and sensibly have preemptible symbols?
>
> (My apologies if I'm disturbingly ignorant about this and the question
> doesn't even make sense.)
>

I mean that the symbols defined by the PIE executable [i.e., not
shared library] can never be preempted. Only symbols in shared
libraries can be preempted by the symbols in the main executable, not
the other way around.

>> but this simply seems to be the result
>> of code being shared between the two modes), and it will attempt to keep
>> absolute references close to each other so that dynamic relocations that
>> trigger CoW for text pages have the smallest possible footprint.
>
> So... Given this behavior, why is it a problem for us? What are the bad
> symptoms? What is currently broken?
>

The bad symptoms are that PIC code will use GOT entries for all symbol
references, meaning that instead of a direct relative reference from
the code, it will emit a relative reference to the GOT entry
containing the absolute address of the symbol. This involves an
additional memory reference, and it requires the GOT entry (which by
definition contains an absolute address) to be fixed up at load time.

What is broken [as reported by Zenith432] is that GCC in LTO mode may
in some cases still emit GOT based relocations that GenFw currently
cannot handle. If the address of a symbol is used in a calculation, or
when the address of a symbol is taken but not dereferenced (but only
passed to a function, for instance), GCC in -Os mode will optimize
this into a GOTPCREL reference.

Quoting from a private email from Zenith432 (who has already proposed
GenFw changes to handle these relocations

"""
I figured out what's going on with LTO build in GCC5 that is compiled
with -Os -flto -DUSING_LTO and does not use visibility #pragma.

When compiling with LTO enabled, what happens is that all C source
files are transformed during compilation stage to LTO intermediate
bytecode (gimple in GCC).

Then when static link (ld) takes place, all LTO intermediate bytecode
is sent back to compiler code-generation backend to have machine code
generated for it as if all the source code is one big C source file
("whole program optimization").

As a result of this, all the extern symbols become local symbols !
like file-level static.  Because it's as if all the code is in one big
source file.  Since there is no dynamic linking, there are no more
"extern", and all symbols are like file-level static and treated the
same.

This is why the LTO build stops emitting GOT loads for
size-optimization purposes.  GCC doesn't emit GOT loads for file-level
static, and in LTO build they're all like that - so no GOT loads.

But there is still something that fouls this up...

If an extern symbol is defined in assembly source file.

Because assembly source files don't participate in LTO.  They are
transformed by assembler into X64 machine code.  During ld, any extern
symbol that is defined in an assembly source file and declared and
used by C source file is treated as before like external symbol.
Which means code generator can go back to its practice of emitting GOT
loads if they reduce code size.
"""

Instead of 'fixing' GenFw, I attempted to go back to the original
changes Steven and I did for LTO, to try and remember why we could not
use the GCC visibility #pragma when enabling LTO. That is the issue
this series aims to fix (but it is an RFC, so comments welcome)

-- 
Ard.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to