On Wed, May 06, 2015 at 11:44:57AM -0700, H.J. Lu wrote: > On Wed, May 6, 2015 at 11:37 AM, Rich Felker <dal...@libc.org> wrote: > > On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote: > >> On Wed, May 6, 2015 at 10:35 AM, Rich Felker <dal...@libc.org> wrote: > >> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote: > >> >> On Wed, 6 May 2015, Jakub Jelinek wrote: > >> >> > The linker would know very well what kind of relocations are used for > >> >> > particular PLT slot, and for the new relocations which would resolve > >> >> > to the > >> >> > address of the .got.plt slot it could just tweak corresponding 3rd > >> >> > insn > >> >> > in the slot, to not jump to first plt slot - 16, but a few bytes > >> >> > before that > >> >> > that would just load the address of _G_O_T_ into %ebx and then > >> >> > fallthru > >> >> > into the 0x4c2b7310 snippet above. The lazy binding would be a few > >> >> > ticks > >> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_. > >> >> > >> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT > >> >> stub. > >> > > >> > Indeed. And the situation is the same on almost all targets. The only > >> > exceptions are those with direct PC-relative addressing (like x86_64) > >> > and those with reserved inter-procedural linkage registers and > >> > efficient PC-relative address loading via them (like ARM and AArch64). > >> > MIPS (o32) is also an interesting exception in that the normal ABI is > >> > already PLT-free, and while callees need a PIC register loaded, it's a > >> > call-clobbered register, not a call-saved one, so it doesn't make the > >> > same kind of trouble, > >> > > >> > I really don't see a need to make no-PLT code gen support lazy binding > >> > when it's necessarily going to be costly to do so, and precludes most > >> > of the benefits of the no-PLT approach. Anyone still wanting/needing > >> > lazy binding semantics can use PLT, and can even choose on a per-TU > >> > basis (or maybe even more fine-grained with pragmas/attributes?). > >> > Those of us who are suffering the cost of PLT with no benefits > >> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by > >> > adding -fno-plt) and enjoy something like a 10% performance boost in > >> > PIC/PIE. > >> > > >> > >> There are things compiler can do for performance and correctness > >> if it is told what options will be passed to linker. -z now is one and > >> -Bsymbolic is another one: > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886 > >> > >> I think we should add -fnow and -fsymbolic. Together with LTO, > >> we can generate faster executables as well as shared libraries. > > > > I don't see how knowing about -Bsymbolic can help the compiler > > optimize. Without visibility, it can't know whether the symbols will > > be defined in the same DSO. With visibility, it can already do the > > equivalent hints. Perhaps it helps in the case where the symbol is > > already defined (and non-weak) in the same TU, but I think in this > > case it should already be optimizing the reference. Symbol > > interposition over top of a non-weak symbol from the same TU is always > > invalid and the compiler should not be pessimizing code to make it > > work. > > -Bsymbolic will bind all references to local definitions in shared libraries, > with and without visibility, weak or non-weak. Compiler can use it > in binds_tls_local_p and we can generate much better codes in shared > libraries.
Yes, I'm aware of what it does. But at compile-time the compiler can't know whether the referenced symbol will be defined in the same DSO unless this is visibility annotation telling it. Even when linking a shared library using -Bsymbolic, the library code can still make calls (or data references) to symbols in other DSOs. > > As for -fnow, I haven't thought about it much but I also don't see > > many places where it could help. The only benefit that comes to mind > > is on targets with weak memory order, where it would eliminate some of > > the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's > > work on AArch64). It might also benefit PLT calls on such targets, but > > you would get a lot more benefit from -fno-plt, and in that case -fnow > > would not allow any further optimization. > > -fno-plt doesn't work with lazy binding. -fnow tells compiler that > lazy binding is not used and it can optimize without PLT. With > -flto -fnow, compiler can make much better choices. Ah, I see now you had LTO in mind. In that case the compiler does know when the symbol is defined in the same DSO for -Bsymbolic. So that clears up the usefulness of your proposed -fsymbolic. I still don't see how -fnow would have a lot of practical usefulness, but I'm certainly not opposed to it. Rich