> On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote:
> > On 05/04/2015 11:39 AM, Jakub Jelinek wrote:
> > >On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote:
> > >>On 05/04/2015 10:37 AM, Alexander Monakov wrote:
> > >>>This patch introduces option -fno-plt that allows to expand calls that 
> > >>>would
> > >>>go via PLT to load the address of the function immediately at call site 
> > >>>(which
> > >>>introduces a GOT load).  Cover letter explains the motivation for this 
> > >>>patch.
> > >>>
> > >>>New option documentation for invoke.texi is missing from the patch; if 
> > >>>this is
> > >>>accepted I'll be happy to send a v2 with documentation added.
> > >>>
> > >>> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and
> > >>> indirect call by forcing address into a pseudo with -fno-plt.
> > >>> * common.opt (flag_plt): New option.
> > >>OK once you cobble together the invoke.texi changes.
> > >
> > >Isn't what Michael/Alan suggested better?  I mean as/ld/compiler changes to
> > >inline the plt slot's first part, then lazy binding will work fine.
> > I must have missed Alan/Michael's message.
> > 
> > ISTM the win here is that by going through the GOT, you can CSE the
> > GOT reference and possibly get some more register allocation
> > freedom.  Is that still the case with Alan/Michael's approach?
> 
> There are many advantages to 'going through the GOT'. CSE'ing the
> reference is just one. The biggest (IMO) is that you can avoid the bad
> PLT ABI that most targets have, where making a call to a PLT slot
> requires the GOT address to be pre-loaded into a fixed, call-saved
> register. This precludes sibcalls and forces many functions which
> otherwise would not need their own stack frames to create one for
> saving the old value of the GOT register. See my blog entry on the
> topic here: http://ewontfix.com/18/

One common pattern I noticed while looking at codegen for speculative 
devirtualization
is that in case we do not inline the virtual call we end up with

if (ptr = &foo)
  foo()

which leads to both GOT lookup to figure out address of foo and call across PLT.
It would be nice to handle this gratefully.

Note that one of improvements I want to do to devirt machinery is to change
the code seuqence to:

 if (vptr == &expected_vtable)
   foo ()
 else
   vptr[token]();

To saven the vtable lookup. But this is not possible in all cases - it happens
that there are multiple predicted vtables all agreeeing on the partiuclar slot.

Honza
> 
> Anyone who really wants lazy binding can use -fplt (which is
> presumably still the default; I didn't check) but lazy binding should
> largely be considered deprecated anyway since effective use of relro
> protection requires -z now too, in which case you're paying all the
> costs (which are considerable!) for lazy binding support even though
> you won't get it.
> 
> Rich

Reply via email to