On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > >>>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>>>> by checking >>>>>>>>> >>>>>>>>> movq foo@gottpoff(%rip), %reg >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> addq foo@gottpoff(%rip), %reg >>>>>>>>> >>>>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>>>> may look like a REX prefix. IE->LE optimization will generate >>>>>>>>> corrupted >>>>>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>>>>> >>>>>>>> Actually, linker has: >>>>>>>> >>>>>>>> case R_X86_64_GOTTPOFF: >>>>>>>> /* Check transition from IE access model: >>>>>>>> mov foo@gottpoff(%rip), %reg >>>>>>>> add foo@gottpoff(%rip), %reg >>>>>>>> */ >>>>>>>> >>>>>>>> /* Check REX prefix first. */ >>>>>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>>>>> { >>>>>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>>>>> if (val != 0x48 && val != 0x4c) >>>>>>>> { >>>>>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>>>>> if (ABI_64_P (abfd)) >>>>>>>> return FALSE; >>>>>>>> } >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> /* X32 may not have any REX prefix. */ >>>>>>>> if (ABI_64_P (abfd)) >>>>>>>> return FALSE; >>>>>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>>>>> return FALSE; >>>>>>>> } >>>>>>>> >>>>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>>>> this is a bug in binutils. >>>>>>>> >>>>>>> >>>>>>> The last byte of the displacement in the previous instruction >>>>>>> may happen to look like a REX byte. In that case, linker >>>>>>> will overwrite the last byte of the previous instruction and >>>>>>> generate the wrong instruction sequence. >>>>>>> >>>>>>> I need to update linker to enforce the REX byte check. >>>>>> >>>>>> One important observation: if we want to follow the x86_64 TLS spec >>>>>> strictly, we have to use existing DImode patterns only. This also >>>>>> means that we should NOT convert other TLS patterns to Pmode, since >>>>>> they explicitly state movq and addq. If this is not the case, then we >>>>>> need new TLS specification for X32. >>>>> >>>>> Here is a patch to properly generate X32 IE sequence. >>>>> >>>>> This is the summary of differences between x86-64 TLS and x32 TLS: >>>>> >>>>> x86-64 x32 >>>>> GD >>>>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq >>>>> foo@tlsgd(%rip),%rdi; >>>>> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >>>>> call __tls_get_addr@plt >>>>> >>>>> GD->IE optimization >>>>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>>>> addq x@gottpoff(%rip),%rax >>>>> >>>>> GD->LE optimization >>>>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>>>> leaq x@tpoff(%rax),%rax >>>>> >>>>> LD >>>>> leaq foo@tlsld(%rip),%rdi; leaq >>>>> foo@tlsld(%rip),%rdi; >>>>> call __tls_get_addr@plt call __tls_get_addr@plt >>>>> >>>>> LD->LE optimization >>>>> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>>>> %fs:0, %eax >>>>> >>>>> IE >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq x@gottpoff(%rip),%reg64 addl >>>>> x@gottpoff(%rip),%reg32 >>>>> >>>>> or >>>>> Not supported if >>>>> Pmode == SImode >>>>> movq x@gottpoff(%rip),%reg64; movq >>>>> x@gottpoff(%rip),%reg64; >>>>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> IE->LE optimization >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq x@gottpoff(%rip),%reg64 addl >>>>> x@gottpoff(%rip),%reg32 >>>>> >>>>> to >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), >>>>> %reg32 >>>>> >>>>> or >>>>> >>>>> movq x@gottpoff(%rip),%reg64 movq >>>>> x@gottpoff(%rip),%reg64; >>>>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> to >>>>> >>>>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>>>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>>>> >>>>> LE >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> leaq x@tpoff(%reg64),%reg32 leal >>>>> x@tpoff(%reg32),%reg32 >>>>> >>>>> or >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >>>>> >>>>> or >>>>> >>>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>>> movl x@tpoff(%reg64),%reg32 movl >>>>> x@tpoff(%reg32),%reg32 >>>>> >>>>> or >>>>> >>>>> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >>>>> >>>>> >>>>> X32 TLS implementation is straight forward, except for IE: >>>>> >>>>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>>>> we can't use it as memory operand. This patch changes >>>>> ix86_decompose_address >>>>> to disallow fs:(reg) if Pmode != word_mode. >>>>> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >>>>> any instructions between MOV and ADD, which may interfere linker >>>>> IE->LE optimization, since the last byte of the previous instruction >>>>> before ADD may look like a REX prefix. This patch adds >>>>> tls_initial_exec_x32 >>>>> to make sure that we always have >>>>> >>>>> movl %fs:0, %reg32 >>>>> addl xgottpoff(%rip), %reg32 >>>>> >>>>> so that the last byte of the previous instruction before ADD will >>>>> never be a REX byte. Tested on Linux/x32. >>>>> >>>>> 2012-03-09 H.J. Lu <hongjiu...@intel.com> >>>>> >>>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>>> if Pmode != word_mode. >>>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>>> Pmode == SImode for x32. >>>>> >>>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>>> (tls_initial_exec_x32): Likewise. >>>> >>>> Nice solution! >>>> >>>> OK for mainline. >>> >>> Done. >>> >>>> BTW: Did you investigate the issue with memory aliasing? >>>> >>> >>> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >>> which loads address of the TLS symbol. >>> >>> Thanks. >>> >> >> Since we must use reg64 in %fs:(%reg) memory operand like >> >> movq x@gottpoff(%rip),%reg64; >> mov %fs:(%reg64),%reg >> >> this patch optimizes x32 TLS IE load and store by wrapping >> %reg64 inside of UNSPEC when Pmode == SImode. OK for >> trunk? > > I think we should just scrap all these complications and go with the > idea of clearing MASK_TLS_DIRECT_SEG_REFS. >
I will give it a try. -- H.J.