Ping.
On Fri, Jul 11, 2014 at 10:42 AM, Sriraman Tallam <tmsri...@google.com> wrote: > Ping. > > On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam <tmsri...@google.com> wrote: >> Hi Uros, >> >> Could you please review this patch? >> >> Thanks >> Sri >> >> On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam <tmsri...@google.com> wrote: >>> Patch Updated. >>> >>> Sri >>> >>> On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam <tmsri...@google.com> wrote: >>>> Ping. >>>> >>>> On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam <tmsri...@google.com> >>>> wrote: >>>>> Ping. >>>>> >>>>> On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam <tmsri...@google.com> >>>>> wrote: >>>>>> Optimize access to globals with -fpie, x86_64 only: >>>>>> >>>>>> Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the >>>>>> module >>>>>> using the GOT. This is two instructions, one to get the address of the >>>>>> global >>>>>> from the GOT and the other to get the value. If it turns out that the >>>>>> global >>>>>> gets defined in the executable at link-time, it still needs to go >>>>>> through the >>>>>> GOT as it is too late then to generate a direct access. >>>>>> >>>>>> Examples: >>>>>> >>>>>> foo.cc >>>>>> ------ >>>>>> int a_glob; >>>>>> int main () { >>>>>> return a_glob; // defined in this file >>>>>> } >>>>>> >>>>>> With -O2 -fpie -pie, the generated code directly accesses the global via >>>>>> PC-relative insn: >>>>>> >>>>>> 5e0 <main>: >>>>>> mov 0x165a(%rip),%eax # 1c40 <a_glob> >>>>>> >>>>>> foo.cc >>>>>> ------ >>>>>> >>>>>> extern int a_glob; >>>>>> int main () { >>>>>> return a_glob; // defined in this file >>>>>> } >>>>>> >>>>>> With -O2 -fpie -pie, the generated code accesses global via GOT using two >>>>>> memory loads: >>>>>> >>>>>> 6f0 <main>: >>>>>> mov 0x1609(%rip),%rax # 1d00 <_DYNAMIC+0x230> >>>>>> mov (%rax),%eax >>>>>> >>>>>> This is true even if in the latter case the global was defined in the >>>>>> executable through a different file. >>>>>> >>>>>> Some experiments on google benchmarks shows that the extra memory loads >>>>>> affects >>>>>> performance by 1% to 5%. >>>>>> >>>>>> >>>>>> Solution - Copy Relocations: >>>>>> >>>>>> When the linker supports copy relocations, GCC can always assume that the >>>>>> global will be defined in the executable. For globals that are truly >>>>>> extern >>>>>> (come from shared objects), the linker will create copy relocations and >>>>>> have >>>>>> them defined in the executable. Result is that no global access needs to >>>>>> go >>>>>> through the GOT and hence improves performance. >>>>>> >>>>>> This patch to the gold linker : >>>>>> https://sourceware.org/ml/binutils/2014-05/msg00092.html >>>>>> submitted recently allows gold to generate copy relocations for -pie >>>>>> mode when >>>>>> necessary. >>>>>> >>>>>> I have added option -mld-pie-copyrelocs which when combined with -fpie >>>>>> would do >>>>>> this. Note that the BFD linker does not support pie copyrelocs yet and >>>>>> this >>>>>> option cannot be used there. >>>>>> >>>>>> Please review. >>>>>> >>>>>> >>>>>> ChangeLog: >>>>>> >>>>>> * config/i386/i36.opt (mld-pie-copyrelocs): New option. >>>>>> * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this >>>>>> address is still legitimate in the presence of copy relocations >>>>>> and -fpie. >>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. >>>>>> * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. >>>>>> >>>>>> >>>>>> >>>>>> Patch attached. >>>>>> Thanks >>>>>> Sri