* H. J. Lu:

> For TLS calls:
>
> 1. UNSPEC_TLS_GD:
>
>   (parallel [
>     (set (reg:DI 0 ax)
>        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
>                 (const_int 0 [0])))
>     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
>     (clobber (reg:DI 5 di))])
>
> 2. UNSPEC_TLS_LD_BASE:
>
>   (parallel [
>     (set (reg:DI 0 ax)
>        (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
>                 (const_int 0 [0])))
>     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
>
> 3. UNSPEC_TLSDESC:
>
>   (parallel [
>      (set (reg/f:DI 104)
>          (plus:DI (unspec:DI [
>                      (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
>                        (reg:DI 114)
>                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
>                     (const:DI (unspec:DI [
>                                (symbol_ref:DI ("e") [flags 0x1a])
>                             ] UNSPEC_DTPOFF))))
>      (clobber (reg:CC 17 flags))])
>
>   (parallel [
>     (set (reg:DI 101)
>        (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>                      (reg:DI 112)
>                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
>     (clobber (reg:CC 17 flags))])
>
> they return the same value for the same input value.  But multiple calls
> with the same input value may be generated for simple programs like:
>
> void a(long *);
> int b(void);
> void c(void);
> static __thread long e;
> long
> d(void)
> {
>   a(&e);
>   if (b())
>     c();
>   return e;
> }
>
> When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> generated:
>
>       .type   d, @function
> d:
> .LFB0:
>       .cfi_startproc
>       pushq   %rbx
>       .cfi_def_cfa_offset 16
>       .cfi_offset 3, -16
>       leaq    e@TLSDESC(%rip), %rbx
>       movq    %rbx, %rax
>       call    *e@TLSCALL(%rax)
>       addq    %fs:0, %rax
>       movq    %rax, %rdi
>       call    a@PLT
>       call    b@PLT
>       testl   %eax, %eax
>       jne     .L8
>       movq    %rbx, %rax
>       call    *e@TLSCALL(%rax)
>       popq    %rbx
>       .cfi_remember_state
>       .cfi_def_cfa_offset 8
>       movq    %fs:(%rax), %rax
>       ret
>       .p2align 4,,10
>       .p2align 3
> .L8:
>       .cfi_restore_state
>       call    c@PLT
>       movq    %rbx, %rax
>       call    *e@TLSCALL(%rax)
>       popq    %rbx
>       .cfi_def_cfa_offset 8
>       movq    %fs:(%rax), %rax
>       ret
>       .cfi_endproc
>
> There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> extend it to also remove redundant TLS calls to generate:
>
> d:
> .LFB0:
>       .cfi_startproc
>       pushq   %rbx
>       .cfi_def_cfa_offset 16
>       .cfi_offset 3, -16
>       leaq    e@TLSDESC(%rip), %rax
>       movq    %fs:0, %rdi
>       call    *e@TLSCALL(%rax)
>       addq    %rax, %rdi
>       movq    %rax, %rbx
>       call    a@PLT
>       call    b@PLT
>       testl   %eax, %eax
>       jne     .L8
>       movq    %fs:(%rbx), %rax
>       popq    %rbx
>       .cfi_remember_state
>       .cfi_def_cfa_offset 8
>       ret
>       .p2align 4,,10
>       .p2align 3
> .L8:
>       .cfi_restore_state
>       call    c@PLT
>       movq    %fs:(%rbx), %rax
>       popq    %rbx
>       .cfi_def_cfa_offset 8
>       ret
>       .cfi_endproc
>
> with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> __tls_get_addr calls in libgcc.a by 72%:
>
> __tls_get_addr calls     before         after
> libgcc.a                 868            243

While this is certainly nice, it does not make it harder to resume
coroutines/fibers on a different from what they were suspended on.  I
do not know to what extent that was previously supported for
global-dynamic TLS.  I recall there be other caching going on (and
certainly for errno because  __errno_location is declared const).

If this impacts software like QEMU, is there a way to get back the old
behavior?

Reply via email to