On 17 December 2013 07:53, Michael Hudson-Doyle <michael.hud...@linaro.org> wrote: > Michael Hudson-Doyle <michael.hud...@linaro.org> writes: > >> Michael Hudson-Doyle <michael.hud...@linaro.org> writes: >> >>> Will Newton <will.new...@linaro.org> writes: >>> >>>> On 16 December 2013 03:36, Michael Hudson-Doyle >>>> <michael.hud...@linaro.org> wrote: >>>>> Michael Hudson-Doyle <michael.hud...@linaro.org> writes: >>>>> >>>>>> Aaah, you might be onto something there. I built myself a cross gcc-4.8 >>>>>> today and it appeared to compile things correctly (I didn't actually get >>>>>> to run it, but the objdump poking looked right) and I got a bit worried >>>>>> that this was all down to some cosmic ray / corruption when I first >>>>>> compiled it. But, the scripts I cargo culted just use compile binutils >>>>>> from git tip, so if the bug is in binutils... >>>>> >>>>> So I still don't know what's going on, exactly, but I have a debug build >>>>> of binutils now and some clues. It still only happens on real hardware, >>>>> not cross compiling on my laptop, but I think I have an idea as to why. >>>>> This might be complete crack, but anyway. >>>>> >>>>> I think it's to do with the order of things within the GOT. >>>>> >>>>> When I cross compile, sort the relocations by address, then count up the >>>>> number of relocations of each type, it looks like this: >>>>> >>>>> $ objdump -C -R build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | >>>>> cut -d' ' -f 2 | uniq -c >>>>> 4 >>>>> 496 R_AARCH64_GLOB_DAT >>>>> 1 R_AARCH64_TLS_TPREL64 >>>>> 103 R_AARCH64_GLOB_DAT >>>>> 305 R_AARCH64_JUMP_SLOT >>>>> 12 R_AARCH64_COPY >>>>> 1 RELOCATION >>>>> 2 >>>>> >>>>> In this case, the code and the relocation agree on where the thread >>>>> local variable is. >>>>> >>>>> When I compile natively, it looks like this: >>>>> >>>>> (t-mwhudson)ubuntu@arm64:~/src/mongo$ objdump -C -R >>>>> build/linux2/*/mongo/base/counter_test | LC_ALL=C sort | cut -d' ' -f 2 | >>>>> uniq -c >>>>> 4 >>>>> 295 R_AARCH64_JUMP_SLOT >>>>> 496 R_AARCH64_GLOB_DAT >>>>> 1 R_AARCH64_TLS_TPREL64 >>>>> 104 R_AARCH64_GLOB_DAT >>>>> 12 R_AARCH64_COPY >>>>> 1 RELOCATION >>>>> 2 >>>>> >>>>> And the code and the relocation disagree on where the thread local >>>>> variable is -- by 298 * sizeof(void*). Which is almost (but I admit, >>>>> not exactly) the number of JUMP_SLOTs that are, in this case, before the >>>>> TLS variable in the GOT. When I compiled in a different way, there were >>>>> only 160 JUMP_SLOTs before the TLS reloc, and the code and relocation >>>>> disagreed by 163 slots. >>>>> >>>>> So is it possible somehow that the GOT has these JUMP_SLOTs inserted >>>>> into it after the relocation for the TLS has been written out? I don't >>>>> really see how but maybe this rings a bell... >>>> >>>> Indeed it does. ;-) >>>> >>>> A similar issue was caused by commit >>>> 692e2b8bcdd8325ebfbe1daace87100d53d15ad6 (which adds ifunc support to >>>> the aarch64 ld backend) but was intended to be fixed by the rework of >>>> the same code in 1419bbe5712af8a43f1d63feb32687649563426d. However I >>>> was never actually able to reproduce the failure case (I saw binaries >>>> that were broken so I know it could happen) so the fix was somewhat >>>> speculative. Hence I am very interested in finding a reproducible case >>>> where this GOT entry misordering happens! >>> >>> I'm possibly doing something wrong, but I've tried to try compiling the >>> suspect binary with both binutils git tip and the commit before >>> 692e2b8bc but both had the problem. So I guess it's something else, or >>> I wasn't testing what I thought I was testing. >> >> Argh, I wasn't testing what I thought I was testing... trying again. > > Ah... found it! This is the code that determines the offset to patch > into the code (elfnn-aarch64.c line 3845): > > value = (symbol_got_offset (input_bfd, h, r_symndx) > + globals->root.sgot->output_section->vma > + globals->root.sgot->output_section->output_offset); > > and this is the code that determines the offset as written into the > relocation (elfnn-aarch64.c line 4248): > > off = symbol_got_offset (input_bfd, h, r_symndx); > ... > rela.r_offset = globals->root.sgot->output_section->vma + > globals->root.sgot->output_offset + off; > > Can you see the difference? The former is > "root.sgot->output_section->output_offset", the latter is > "root.sgot->output_offset".
Yes, that does look a bit odd. > This suggests the rather obvious attached patch. I haven't tested this > exact patch, but its an obvious translation from a patch to > 692e2b8bcdd8325ebfbe1daace87100d53d15ad6^ which does work. I also > haven't tested the second hunk at all, but it seems plausible... Thanks for you analysis, the fix does look plausible indeed. ;-) Have you verified it fixes the problem you were seeing? I'm about to disappear to sunnier climes for three weeks but I'll definitely look at it when I get back. I've added Marcus to CC in case he isn't reading this list. -- Will Newton Toolchain Working Group, Linaro _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain