Hi,
I want to share some result about the relocation during the loading (with
RTLD_LAZY).
Relocation count for single so (libqt5) + without optimization:
R_ARM_GLOB_DAT: 1585
R_ARM_RELATIVE: 9823
R_ARM_ABS32: 19489
R_ARM_JUMP_SLOT: 16998
Relocation count for single so (libqt5) + with optimization:
R_ARM_GLOB_DAT: 1578
R_ARM_RELATIVE: 28227
R_ARM_ABS32: 435
R_ARM_JUMP_SLOT: 290
And the optimization done here is only about changing the visibility of
exported symbols from "default" to "protected", thanks Thiago's blog ;).
So:
- the R_ARM_JUMP_SLOT relocation is reduced significantly,
but which is only happened at run time (as RTLD_LAZY), so it's irrelevant
with the loading performance.
- the R_ARM_RELATIVE relocation is increase but this type relocation is very
fast.
- actually for loading time, the bottleneck is the R_ARM_ABS32 relocation,
which is reduced around 97% now !
Finally the overall loading time is reduced from ~10-20s to ~1s...
But I still have some question about the R_ARM_ABS32 relocation.
It seems if the function is virtual (with "default" visibility), then it will
be added into .rel.dyn as the R_ARM_ABS32 type, for example:
007b0124 0011a802 R_ARM_ABS32 00311e4b
_ZN20QEventDispatcherUNIX13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE
Could someone help with below:
1. why the virtual function with "default" visibility needs relocation even if
it's implemented inside ?
2. when changed to "protected" visibility, I guess it's optimized to add a
GOT.PLT entry as a R_ARM_RELATIVE relocation, is that true ?
Thanks,
Song
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
ext Thiago Macieira
Sent: Tuesday, July 24, 2012 10:29 PM
To: [email protected]
Subject: Re: [Development] how to reduce the relocation <-- Use static qt
libraries
On terça-feira, 24 de julho de 2012 13.22.25, [email protected] wrote:
> Yes, the bottleneck of the loading now is the local relocations
> instead of inter-library's.
>
> So what we want to do will be reducing the number of local relocation.
>
> Based on my understanding, this local relocation should be caused by
> the "symbol inter-positioning".
That's not exactly the case. Some types of relocations do permit symbol
interpositioning. But some types of code require relocations even if they're
not interposable.
In my listing, all the "local" relocations are non-interposable.
More information:
http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-
linux/
http://www.macieira.org/blog/2012/01/update-and-benchmark-on-the-dynamic-
library-proposals/
> And from gcc option -Bsymbolic:
> "
> When creating a shared library, bind references to global symbols to
> the definition within the shared library, if any. Normally, it is
> possible for a program linked against a shared library to override the
> definition within the shared library. This option is only meaningful
> on ELF platforms which support shared libraries. "
>
> But for my case, it's not needed to override the definition within the
> libqt5.so.
Yes, it is.
But you didn't realise that your code requires relocations even if the symbols
can't be overridden.
In order to do that, you need a fully position *dependent* code that can't be
moved. Executables on Linux are like that, but all libraries are movable in
memory, even those compiled without -fPIC.
Since you're not running Linux, check if your OS supports that. Note that
you'll need to know the exact load address at build time and that it must match
the loaded address for the ROM if you want to do XIP.
> So, besides the prelink solution, I think the compiler (I mean
> armlink) should provide the ability to disable this symbol
> inter-positioning, just like the -Bsymbolic in gcc.
>
> Does anyone have idea from the compiler point of view ?
Sorry, you're barking up the wrong tree.
Your only option to reduce the number of relocations is to prelink to the exact
load address. There are two ways of doing that:
1) the ELF prelinker, which prelinks all relocations to a given address, but
does still allow relocating if the shared object is loaded at a different
address. The code is PIC, so XIP should work just fine.
2) compile without PIC and prelink at a specific address at link time, which
means that the code must be loaded there or it will fail to run. This is the
Windows DLL model.
>
> Also I see that Qt also uses the "-Bsymbolic-functions" to do some
> optimization, is that similar case to reduce the relocation ?
Yes. Read my blogs for more detail.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center
Intel Sweden AB - Registration Number: 556189-6027
Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
_______________________________________________
Development mailing list
[email protected]
http://lists.qt-project.org/mailman/listinfo/development