On Tue, Jul 29, 2025 at 9:28 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Mon, Jul 28, 2025 at 01:53:08PM -0700, H.J. Lu wrote:
> > On Mon, Jul 28, 2025 at 04:51:24PM +0800, Hongtao Liu wrote:
> > > On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> > > >
> > > > For TLS calls:
> > > >
> > > > 1. UNSPEC_TLS_GD:
> > > >
> > > >   (parallel [
> > > >     (set (reg:DI 0 ax)
> > > >          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > > >                   (const_int 0 [0])))
> > > >     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> > > >                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
> > > >     (clobber (reg:DI 5 di))])
> > > >
> > > > 2. UNSPEC_TLS_LD_BASE:
> > > >
> > > >   (parallel [
> > > >     (set (reg:DI 0 ax)
> > > >          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > > >                   (const_int 0 [0])))
> > > >     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
> > > >
> > > > 3. UNSPEC_TLSDESC:
> > > >
> > > >   (parallel [
> > > >      (set (reg/f:DI 104)
> > > >            (plus:DI (unspec:DI [
> > > >                        (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 
> > > > 0x10])
> > > >                        (reg:DI 114)
> > > >                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
> > > >                     (const:DI (unspec:DI [
> > > >                                  (symbol_ref:DI ("e") [flags 0x1a])
> > > >                               ] UNSPEC_DTPOFF))))
> > > >      (clobber (reg:CC 17 flags))])
> > > >
> > > >   (parallel [
> > > >     (set (reg:DI 101)
> > > >          (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> > > >                      (reg:DI 112)
> > > >                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> > > >     (clobber (reg:CC 17 flags))])
> > > >
> > > > they return the same value for the same input value.  But multiple calls
> > > > with the same input value may be generated for simple programs like:
> > > >
> > > > void a(long *);
> > > > int b(void);
> > > > void c(void);
> > > > static __thread long e;
> > > > long
> > > > d(void)
> > > > {
> > > >   a(&e);
> > > >   if (b())
> > > >     c();
> > > >   return e;
> > > > }
> > > >
> > > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> > > > generated:
> > > >
> > > >         .type   d, @function
> > > > d:
> > > > .LFB0:
> > > >         .cfi_startproc
> > > >         pushq   %rbx
> > > >         .cfi_def_cfa_offset 16
> > > >         .cfi_offset 3, -16
> > > >         leaq    e@TLSDESC(%rip), %rbx
> > > >         movq    %rbx, %rax
> > > >         call    *e@TLSCALL(%rax)
> > > >         addq    %fs:0, %rax
> > > >         movq    %rax, %rdi
> > > >         call    a@PLT
> > > >         call    b@PLT
> > > >         testl   %eax, %eax
> > > >         jne     .L8
> > > >         movq    %rbx, %rax
> > > >         call    *e@TLSCALL(%rax)
> > > >         popq    %rbx
> > > >         .cfi_remember_state
> > > >         .cfi_def_cfa_offset 8
> > > >         movq    %fs:(%rax), %rax
> > > >         ret
> > > >         .p2align 4,,10
> > > >         .p2align 3
> > > > .L8:
> > > >         .cfi_restore_state
> > > >         call    c@PLT
> > > >         movq    %rbx, %rax
> > > >         call    *e@TLSCALL(%rax)
> > > >         popq    %rbx
> > > >         .cfi_def_cfa_offset 8
> > > >         movq    %fs:(%rax), %rax
> > > >         ret
> > > >         .cfi_endproc
> > > >
> > > > There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> > > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> > > > extend it to also remove redundant TLS calls to generate:
> > > >
> > > > d:
> > > > .LFB0:
> > > >         .cfi_startproc
> > > >         pushq   %rbx
> > > >         .cfi_def_cfa_offset 16
> > > >         .cfi_offset 3, -16
> > > >         leaq    e@TLSDESC(%rip), %rax
> > > >         movq    %fs:0, %rdi
> > > >         call    *e@TLSCALL(%rax)
> > > >         addq    %rax, %rdi
> > > >         movq    %rax, %rbx
> > > >         call    a@PLT
> > > >         call    b@PLT
> > > >         testl   %eax, %eax
> > > >         jne     .L8
> > > >         movq    %fs:(%rbx), %rax
> > > >         popq    %rbx
> > > >         .cfi_remember_state
> > > >         .cfi_def_cfa_offset 8
> > > >         ret
> > > >         .p2align 4,,10
> > > >         .p2align 3
> > > > .L8:
> > > >         .cfi_restore_state
> > > >         call    c@PLT
> > > >         movq    %fs:(%rbx), %rax
> > > >         popq    %rbx
> > > >         .cfi_def_cfa_offset 8
> > > >         ret
> > > >         .cfi_endproc
> > > >
> > > > with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> > > > __tls_get_addr calls in libgcc.a by 72%:
> > > >
> > > > __tls_get_addr calls     before         after
> > > > libgcc.a                 868            243
> > > >
> > > > gcc/
> > > >
> > > >         PR target/81501
> > > >         * config/i386/i386-features.cc (x86_cse_kind): Add 
> > > > X86_CSE_TLS_GD,
> > > >         X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
> > > >         (redundant_load): Renamed to ...
> > > >         (redundant_pattern): This.
> > > >         (replace_tls_call): New.
> > > >         (ix86_place_single_tls_call): Likewise.
> > > >         (pass_remove_redundant_vector_load): Renamed to ...
> > > >         (pass_x86_cse): This.  Add val, def_insn, mode, scalar_mode, 
> > > > kind,
> > > >         x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and
> > > >         candidate_vector_p.
> > > >         (pass_x86_cse::candidate_gnu_tls_p): New.
> > > >         (pass_x86_cse::candidate_gnu2_tls_p): Likewise.
> > > >         (pass_x86_cse::candidate_vector_p): Likewise.
> > > >         (remove_redundant_vector_load): Renamed to ...
> > > >         (pass_x86_cse::x86_cse): This.  Extend to remove redundant TLS
> > > >         calls.
> > > >         (make_pass_remove_redundant_vector_load): Renamed to ...
> > > >         (make_pass_x86_cse): This.
> > > >         (config/i386/i386-passes.def): Replace
> > > >         pass_remove_redundant_vector_load with pass_x86_cse.
> > > >         config/i386/i386-protos.h (ix86_tls_get_addr): New.
> > > >         (make_pass_remove_redundant_vector_load): Renamed to ...
> > > >         (make_pass_x86_cse): This.
> > > >         * config/i386/i386.cc (ix86_tls_get_addr): Remove static.
> > > >         * config/i386/i386.h (machine_function): Add
> > > >         tls_descriptor_call_multiple_p.
> > > >         * config/i386/i386.md (tls64): New attribute.
> > > >         (@tls_global_dynamic_64_<mode>): Set 
> > > > tls_descriptor_call_multiple_p.
> > > >         (@tls_local_dynamic_base_64_<mode>): Likewise.
> > > >         (@tls_dynamic_gnu2_64_<mode>): Likewise.
> > > >         (*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd.
> > > >         (*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to 
> > > > ld_base.
> > > >         (*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea.
> > > >         (*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call.
> > > >         (*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to
> > > >         combine.
> > > >
> > > > gcc/testsuite/
> > > >
> > > >         PR target/81501
> > > >         * g++.target/i386/pr81501-1.C: New test.
> > > >         * gcc.target/i386/pr81501-1a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-1b.c: Likewise.
> > > >         * gcc.target/i386/pr81501-2a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-2b.c: Likewise.
> > > >         * gcc.target/i386/pr81501-3.c: Likewise.
> > > >         * gcc.target/i386/pr81501-4a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-4b.c: Likewise.
> > > >         * gcc.target/i386/pr81501-5.c: Likewise.
> > > >         * gcc.target/i386/pr81501-6a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-6b.c: Likewise.
> > > >         * gcc.target/i386/pr81501-7.c: Likewise.
> > > >         * gcc.target/i386/pr81501-8a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-8b.c: Likewise.
> > > >         * gcc.target/i386/pr81501-9a.c: Likewise.
> > > >         * gcc.target/i386/pr81501-9b.c: Likewise.
> > > >
> > > > Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
> > > > ---
> > > >  gcc/config/i386/i386-features.cc           | 838 +++++++++++++++++----
> > > >  gcc/config/i386/i386-passes.def            |   2 +-
> > > >  gcc/config/i386/i386-protos.h              |   4 +-
> > > >  gcc/config/i386/i386.cc                    |   2 +-
> > > >  gcc/config/i386/i386.h                     |   3 +
> > > >  gcc/config/i386/i386.md                    |  25 +-
> > > >  gcc/testsuite/g++.target/i386/pr81501-1.C  |  16 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-1a.c |  17 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-1b.c |   6 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-2a.c |  17 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-2b.c |   6 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-3.c  |   9 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-4a.c |  51 ++
> > > >  gcc/testsuite/gcc.target/i386/pr81501-4b.c |   6 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-5.c  |  13 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-6a.c |  67 ++
> > > >  gcc/testsuite/gcc.target/i386/pr81501-6b.c |  28 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-7.c  |  20 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-8a.c |  82 ++
> > > >  gcc/testsuite/gcc.target/i386/pr81501-8b.c |  31 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-9a.c |  39 +
> > > >  gcc/testsuite/gcc.target/i386/pr81501-9b.c |  22 +
> > > >  22 files changed, 1148 insertions(+), 156 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr81501-1.C
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-1a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-1b.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-2a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-2b.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-3.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-4a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-4b.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-5.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-6a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-6b.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-7.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-8a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-8b.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-9a.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-9b.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > b/gcc/config/i386/i386-features.cc
> > > > index c131577805f..80a1e6caa0e 100644
> > > > --- a/gcc/config/i386/i386-features.cc
> > > > +++ b/gcc/config/i386/i386-features.cc
> > > > @@ -3493,10 +3493,13 @@ enum x86_cse_kind
> > > >  {
> > > >    X86_CSE_CONST0_VECTOR,
> > > >    X86_CSE_CONSTM1_VECTOR,
> > > > -  X86_CSE_VEC_DUP
> > > > +  X86_CSE_VEC_DUP,
> > > > +  X86_CSE_TLS_GD,
> > > > +  X86_CSE_TLS_LD_BASE,
> > > > +  X86_CSE_TLSDESC
> > > >  };
> > > >
> > > > -struct redundant_load
> > > > +struct redundant_pattern
> > > >  {
> > > >    /* Bitmap of basic blocks with broadcast instructions.  */
> > > >    auto_bitmap bbs;
> > > > @@ -3669,22 +3672,541 @@ ix86_broadcast_inner (rtx op, machine_mode 
> > > > mode,
> > > >    return op;
> > > >  }
> > > >
> > > > -/* At entry of the nearest common dominator for basic blocks with 
> > > > vector
> > > > -   CONST0_RTX and integer CONSTM1_RTX uses, generate a single widest
> > > > -   vector set instruction for all CONST0_RTX and integer CONSTM1_RTX
> > > > -   uses.
> > > > +/* Replace CALL instruction in TLS_CALL_INSNS with SET from SRC.  */
> > > >
> > > > -   NB: We want to generate only a single widest vector set to cover the
> > > > -   whole function.  The LCM algorithm isn't appropriate here since it
> > > > -   may place a vector set inside the loop.  */
> > > > +static void
> > > > +replace_tls_call (rtx src, auto_bitmap &tls_call_insns)
> > > > +{
> > > > +  bitmap_iterator bi;
> > > > +  unsigned int id;
> > > >
> > > > -static unsigned int
> > > > -remove_redundant_vector_load (void)
> > > > +  EXECUTE_IF_SET_IN_BITMAP (tls_call_insns, 0, id, bi)
> > > > +    {
> > > > +      rtx_insn *insn = DF_INSN_UID_GET (id)->insn;
> > > > +
> > > > +      /* If this isn't a CALL, only GNU2 TLS implicit CALL patterns are
> > > > +        allowed.  */
> > >
> > > > +      if (!CALL_P (insn))
> > > > +       {
> > > > +         attr_tls64 tls64 = get_attr_tls64 (insn);
> > > > +         if (tls64 != TLS64_CALL && tls64 != TLS64_COMBINE)
> > > > +           gcc_unreachable ();
> > > > +       }
> > > > +
> > > > +      rtx pat = PATTERN (insn);
> > > > +      if (GET_CODE (pat) != PARALLEL)
> > > > +       gcc_unreachable ();
> > > > +
> > > > +      int j;
> > > > +      rtx op, dest = nullptr;
> > > > +      for (j = XVECLEN (pat, 0) - 1; j >= 0; j--)
> > >
> > > SET is always at the first of parallel for tls64
> > > "combine/call/ld_base/gd", so no need for the iteration?
> > >
> >
> > Fixed in v4.
> >
> > > > +       {
> > > > +         op = XVECEXP (pat, 0, j);
> > > > +         if (GET_CODE (op) == SET)
> > > > +           {
> > > > +             dest = SET_DEST (op);
> > > > +             break;
> > > > +           }
> > > > +       }
> > > > +
> > > > +      rtx set = gen_rtx_SET (dest, src);
> > > > +      rtx_insn *set_insn = emit_insn_after (set, insn);
> > > > +      if (recog_memoized (set_insn) < 0)
> > > > +       gcc_unreachable ();
> > > > +
> > > > +      if (dump_file)
> > > > +       {
> > > > +         fprintf (dump_file, "\nReplace:\n\n");
> > > > +         print_rtl_single (dump_file, insn);
> > > > +         fprintf (dump_file, "\nwith:\n\n");
> > > > +         print_rtl_single (dump_file, set_insn);
> > > > +         fprintf (dump_file, "\n");
> > > > +       }
> > > > +
> > > > +      /* Delete the CALL insn.  */
> > > > +      delete_insn (insn);
> > > > +
> > > > +      df_insn_rescan (set_insn);
> > > > +    }
> > > > +}
> > > > +
> > > > +/* Generate a TLS call of KIND with VAL and copy the call result to 
> > > > DEST,
> > > > +   at entry of the nearest dominator for basic block map BBS, which is 
> > > > in
> > > > +   the fake loop that contains the whole function, so that there is 
> > > > only
> > > > +   a single TLS CALL of KIND with VAL in the whole function.  If
> > > > +   TLSDESC_SET isn't nullptr, insert it before the TLS call.  */
> > > > +
> > > > +static void
> > > > +ix86_place_single_tls_call (rtx dest, rtx val, x86_cse_kind kind,
> > > > +                           bitmap bbs, rtx tlsdesc_set = nullptr)
> > > > +{
> > > > +  basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, 
> > > > bbs);
> > > > +  while (bb->loop_father->latch
> > > > +        != EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > > +    bb = get_immediate_dominator (CDI_DOMINATORS,
> > > > +                                 bb->loop_father->header);
> > > > +
> > > > +  rtx_insn *insn = BB_HEAD (bb);
> > > > +  while (insn && !NONDEBUG_INSN_P (insn))
> > > > +    {
> > > > +      if (insn == BB_END (bb))
> > > > +       {
> > > > +         insn = NULL;
> > > > +         break;
> > > > +       }
> > > > +      insn = NEXT_INSN (insn);
> > > > +    }
> > > > +
> > > > +  rtx rax = nullptr, rdi;
> > > > +  rtx eqv = nullptr;
> > > > +  rtx caddr;
> > > > +  rtx set;
> > > > +  rtx clob;
> > > > +  rtx symbol;
> > > > +  rtx tls;
> > > > +  rtx_insn *tls_insn;
> > > > +
> > > > +  switch (kind)
> > > > +    {
> > > > +    case X86_CSE_TLS_GD:
> > > > +      rax = gen_rtx_REG (Pmode, AX_REG);
> > > > +      rdi = gen_rtx_REG (Pmode, DI_REG);
> > > > +      caddr = ix86_tls_get_addr ();
> > > > +
> > > > +      symbol = XVECEXP (val, 0, 0);
> > > > +      tls = gen_tls_global_dynamic_64 (Pmode, rax, symbol, caddr, rdi);
> > > > +
> > > > +      if (GET_MODE (symbol) != Pmode)
> > > > +       symbol = gen_rtx_ZERO_EXTEND (Pmode, symbol);
> > > > +      eqv = symbol;
> > > > +      break;
> > > > +
> > > > +    case X86_CSE_TLS_LD_BASE:
> > > > +      rax = gen_rtx_REG (Pmode, AX_REG);
> > > > +      rdi = gen_rtx_REG (Pmode, DI_REG);
> > >
> > > Considering that the pass is before register allocation, if we use a
> > > pseudo-register, RA will handle the pattern with clobber rdi/rci/rax
> > > by itself.
> >
> > These patterns take explicit RAX and RDI register operands.  Using
> > pseudo-registers doesn't work:
> >
> > /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc 
> > -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ 
> > -O2 -fPIC -mx32 -S pr81501-1a.c
> > pr81501-1a.c: In function ‘d’:
> > pr81501-1a.c:15:1: error: unable to generate reloads for:
> >    15 | }
> >       | ^
> > (call_insn/u 43 2 44 2 (parallel [
> >             (set (reg:SI 112)
> >                 (call:SI (mem:QI (symbol_ref:SI ("__tls_get_addr")) [0  S1 
> > A8])
> >                     (const_int 0 [0])))
> >             (unspec:SI [
> >                     (reg/f:SI 7 sp)
> >                 ] UNSPEC_TLS_LD_BASE)
> >             (clobber (reg:SI 113))
> >         ]) 1660 {*tls_local_dynamic_base_64_si}
> >      (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000])
> >         (nil))
> >     (nil))
> > during RTL pass: reload
> > pr81501-1a.c:15:1: internal compiler error: in curr_insn_transform, at 
> > lra-constraints.cc:4372
> >
> > >
> > > > +      caddr = ix86_tls_get_addr ();
> > > > +
> > > > +      tls = gen_tls_local_dynamic_base_64 (Pmode, rax, caddr, rdi);
> > > > +
> > > > +      /* Attach a unique REG_EQUAL to DEST, to allow the RTL optimizers
> > > > +        to share the LD_BASE result with other LD model accesses.  */
> > > > +      eqv = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx),
> > > > +                           UNSPEC_TLS_LD_BASE);
> > > > +
> > > > +      break;
> > > > +
> > > > +    case X86_CSE_TLSDESC:
> > > > +      set = gen_rtx_SET (dest, val);
> > > > +      clob = gen_rtx_CLOBBER (VOIDmode,
> > > > +                             gen_rtx_REG (CCmode, FLAGS_REG));
> > > > +      tls = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clob));
> > > > +      break;
> > > > +
> > > > +    default:
> > > > +      gcc_unreachable ();
> > > > +    }
> > > > +
> > > > +  rtx_insn *before = nullptr;
> > > > +  rtx_insn *after = nullptr;
> > > > +  if (insn == BB_HEAD (bb))
> > > > +    before = insn;
> > > > +  else
> > > > +    after = insn ? PREV_INSN (insn) : BB_END (bb);
> > > > +
> > > > +  /* TLS_GD and TLS_LD_BASE instructions are normal functions which
> > > > +     clobber caller-saved registers.  TLSDESC instructions are special
> > > > +     functions which only clobber RAX.  If any registers clobbered by
> > > > +     the TLS instruction are live in this basic block, we must insert
> > > > +     the TLS instruction after all live registers clobbered by the TLS
> > > > +     instruction are dead.  */
> > > > +
> > > > +  auto_bitmap live_caller_saved_regs;
> > > > +  bitmap in = df_live ? DF_LIVE_IN (bb) : DF_LR_IN (bb);
> > > > +
> > > > +  bool flags_live_p = bitmap_bit_p (in, FLAGS_REG);
> > > > +
> > > > +  unsigned int i;
> > > > +
> > > > +  /* Get all live caller-saved registers.  */
> > > > +  if (kind == X86_CSE_TLSDESC)
> > > > +    {
> > > > +      if (bitmap_bit_p (in, AX_REG))
> > > > +       bitmap_set_bit (live_caller_saved_regs, AX_REG);
> > >
> > > And we don't need to check for those hard registers here and below?
> >
> > TLS_GD and TLS_LD_BASE instructions are normal functions which
> > clobber caller-saved registers.  TLSDESC instructions are special
> > functions which only clobber RAX.  live_caller_saved_regs captures
> > live caller-saved registers for these TLS instructions.

I notice those insns are CALL_INSN, and for ABI, rax/rdi/rsi is
caller_saved registers, so even we explicitly use (clobber (reg: RAX))
RA will help save and restore the register?

> >
> > >
> > > > +    }
> > > > +  else
> > > > +    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> > > > +      if (call_used_regs[i]
> > > > +         && !fixed_regs[i]
> > > > +         && bitmap_bit_p (in, i))
> > > > +       bitmap_set_bit (live_caller_saved_regs, i);
> > > > +
> > > > +  if (!bitmap_empty_p (live_caller_saved_regs))
> > > > +    {
> > > > +      /* Search for REG_DEAD notes in this basic block.  */
> > > > +      FOR_BB_INSNS (bb, insn)
> > > > +       {
> > > > +         if (!NONDEBUG_INSN_P (insn))
> > > > +           continue;
> > > > +
> > > > +         /* Check if FLAGS register is live.  */
> > > > +         set = single_set (insn);
> > > > +         if (set)
> > > > +           {
> > > > +             rtx dest = SET_DEST (set);
> > > > +             if (REG_P (dest) && REGNO (dest) == FLAGS_REG)
> > > > +               flags_live_p = true;
> > > > +           }
> > > > +
> > > > +         rtx link;
> > > > +         for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
> > > > +           if (REG_NOTE_KIND (link) == REG_DEAD
> > > > +               && REG_P (XEXP (link, 0)))
> > > > +             {
> > > > +               /* Mark the live caller-saved register as dead.  */
> > > > +               for (i = REGNO (XEXP (link, 0));
> > > > +                    i < END_REGNO (XEXP (link, 0));
> > > > +                    i++)
> > > > +                 bitmap_clear_bit (live_caller_saved_regs, i);
> > > > +
> > > > +               /* Check if FLAGS register is dead.  */
> > > > +               if (REGNO (XEXP (link, 0)) == FLAGS_REG)
> > > > +                 flags_live_p = false;
> > > > +
> > > > +               if (bitmap_empty_p (live_caller_saved_regs))
> > > > +                 {
> > > > +                   /* All live caller-saved registers are dead after
> > > > +                      this instruction.  Since TLS instructions
> > > > +                      clobber FLAGS register, it must be dead where
> > > > +                      the TLS will be inserted after.  */
> > > > +                   if (flags_live_p)
> > > > +                     gcc_unreachable ();
> > > > +                   after = insn;
> > > > +                   goto insert_after;
> > > > +                 }
> > > > +             }
> > > > +       }
> > > > +
> > > > +      /* All live caller-saved registers should be dead at the end
> > > > +        of this basic block.  */
> > > > +      gcc_unreachable ();
> > > > +    }
> > > > +
> > > > +  /* Emit the TLS CALL insn.  */
> > > > +  if (after)
> > > > +    {
> > > > +insert_after:
> > > > +      tls_insn = emit_insn_after (tls, after);
> > > > +    }
> > > > +  else
> > > > +    tls_insn = emit_insn_before (tls, before);
> > > > +
> > > > +  rtx_insn *tlsdesc_insn = nullptr;
> > > > +  if (tlsdesc_set)
> > > > +    {
> > > > +      rtx dest = copy_rtx (SET_DEST (tlsdesc_set));
> > > > +      rtx src = copy_rtx (SET_SRC (tlsdesc_set));
> > > > +      tlsdesc_set = gen_rtx_SET (dest, src);
> > > > +      tlsdesc_insn = emit_insn_before (tlsdesc_set, tls_insn);
> > > > +    }
> > > > +
> > > > +  if (kind != X86_CSE_TLSDESC)
> > > > +    {
> > > > +      RTL_CONST_CALL_P (tls_insn) = 1;
> > > > +
> > > > +      /* Indicate that this function can't jump to non-local gotos.  */
> > > > +      make_reg_eh_region_note_nothrow_nononlocal (tls_insn);
> > > > +    }
> > > > +
> > > > +  if (recog_memoized (tls_insn) < 0)
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  if (dump_file)
> > > > +    {
> > > > +      if (after)
> > > > +       {
> > > > +         fprintf (dump_file, "\nPlace:\n\n");
> > > > +         if (tlsdesc_insn)
> > > > +           print_rtl_single (dump_file, tlsdesc_insn);
> > > > +         print_rtl_single (dump_file, tls_insn);
> > > > +         fprintf (dump_file, "\nafter:\n\n");
> > > > +         print_rtl_single (dump_file, after);
> > > > +         fprintf (dump_file, "\n");
> > > > +       }
> > > > +      else
> > > > +       {
> > > > +         fprintf (dump_file, "\nPlace:\n\n");
> > > > +         if (tlsdesc_insn)
> > > > +           print_rtl_single (dump_file, tlsdesc_insn);
> > > > +         print_rtl_single (dump_file, tls_insn);
> > > > +         fprintf (dump_file, "\nbefore:\n\n");
> > > > +         print_rtl_single (dump_file, insn);
> > > > +         fprintf (dump_file, "\n");
> > > > +       }
> > > > +    }
> > > > +
> > > > +  if (kind != X86_CSE_TLSDESC)
> > > > +    {
> > > > +      /* Copy RAX to DEST.  */
> > > > +      set = gen_rtx_SET (dest, rax);
> > > > +      rtx_insn *set_insn = emit_insn_after (set, tls_insn);
> > > > +      set_dst_reg_note (set_insn, REG_EQUAL, copy_rtx (eqv), dest);
> > > > +      if (dump_file)
> > > > +       {
> > > > +         fprintf (dump_file, "\nPlace:\n\n");
> > > > +         print_rtl_single (dump_file, set_insn);
> > > > +         fprintf (dump_file, "\nafter:\n\n");
> > > > +         print_rtl_single (dump_file, tls_insn);
> > > > +         fprintf (dump_file, "\n");
> > > > +       }
> > > > +    }
> > > > +}
> > > > +
> > > > +namespace {
> > > > +
> > > > +const pass_data pass_data_x86_cse =
> > > > +{
> > > > +  RTL_PASS, /* type */
> > > > +  "x86_cse", /* name */
> > > > +  OPTGROUP_NONE, /* optinfo_flags */
> > > > +  TV_MACH_DEP, /* tv_id */
> > > > +  0, /* properties_required */
> > > > +  0, /* properties_provided */
> > > > +  0, /* properties_destroyed */
> > > > +  0, /* todo_flags_start */
> > > > +  0, /* todo_flags_finish */
> > > > +};
> > > > +
> > > > +class pass_x86_cse : public rtl_opt_pass
> > > > +{
> > > > +public:
> > > > +  pass_x86_cse (gcc::context *ctxt)
> > > > +    : rtl_opt_pass (pass_data_x86_cse, ctxt)
> > > > +  {}
> > > > +
> > > > +  /* opt_pass methods: */
> > > > +  bool gate (function *fun) final override
> > > > +    {
> > > > +      return (TARGET_SSE2
> > > > +             && optimize
> > > > +             && optimize_function_for_speed_p (fun));
> > > > +    }
> > > > +
> > > > +  unsigned int execute (function *) final override
> > > > +    {
> > > > +      return x86_cse ();
> > > > +    }
> > > > +
> > > > +private:
> > > > +  /* The redundant source value.  */
> > > > +  rtx val;
> > > > +  /* The instruction which defines the redundant value.  */
> > > > +  rtx_insn *def_insn;
> > > > +  /* Mode of the destination of the candidate redundant instruction.  
> > > > */
> > > > +  machine_mode mode;
> > > > +  /* Mode of the source of the candidate redundant instruction.  */
> > > > +  machine_mode scalar_mode;
> > > > +  /* The classification of the candidate redundant instruction.  */
> > > > +  x86_cse_kind kind;
> > > > +
> > > > +  unsigned int x86_cse (void);
> > > > +  bool candidate_gnu_tls_p (rtx_insn *, attr_tls64);
> > > > +  bool candidate_gnu2_tls_p (rtx, attr_tls64);
> > > > +  bool candidate_vector_p (rtx);
> > > > +}; // class pass_x86_cse
> > > > +
> > > > +/* Return true and output def_insn, val, mode, scalar_mode and kind if
> > > > +   INSN is UNSPEC_TLS_GD or UNSPEC_TLS_LD_BASE.  */
> > > > +
> > > > +bool
> > > > +pass_x86_cse::candidate_gnu_tls_p (rtx_insn *insn, attr_tls64 tls64)
> > > > +{
> > > > +  if (!TARGET_64BIT || !cfun->machine->tls_descriptor_call_multiple_p)
> > > > +    return false;
> > > > +
> > > > +  /* Record the redundant TLS CALLs for 64-bit:
> > > > +
> > > > +     (parallel [
> > > > +       (set (reg:DI 0 ax)
> > > > +            (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > > > +                     (const_int 0 [0])))
> > > > +       (unspec:DI [(symbol_ref:DI ("foo") [flags 0x50])
> > > > +                   (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
> > > > +       (clobber (reg:DI 5 di))])
> > > > +
> > > > +
> > > > +     and
> > > > +
> > > > +     (parallel [
> > > > +       (set (reg:DI 0 ax)
> > > > +            (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> > > > +                     (const_int 0 [0])))
> > > > +       (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
> > > > +
> > > > +   */
> > > > +
> > > > +  rtx pat = PATTERN (insn);
> > > > +  rtx set = XVECEXP (pat, 0, 0);
> > > > +  gcc_assert (GET_CODE (set) == SET);
> > > > +  rtx dest = SET_DEST (set);
> > > > +  scalar_mode = mode = GET_MODE (dest);
> > > > +  val = XVECEXP (pat, 0, 1);
> > > > +  gcc_assert (GET_CODE (val) == UNSPEC);
> > > > +
> > > > +  if (tls64 == TLS64_GD)
> > > > +    kind = X86_CSE_TLS_GD;
> > > > +  else
> > > > +    kind = X86_CSE_TLS_LD_BASE;
> > > > +
> > > > +  def_insn = nullptr;
> > > > +  return true;
> > > > +}
> > > > +
> > > > +/* Return true and output def_insn, val, mode, scalar_mode and kind if
> > > > +   SET is UNSPEC_TLSDESC.  */
> > > > +
> > > > +bool
> > > > +pass_x86_cse::candidate_gnu2_tls_p (rtx set, attr_tls64 tls64)
> > > > +{
> > > > +  if (!TARGET_64BIT || !cfun->machine->tls_descriptor_call_multiple_p)
> > > > +    return false;
> > > > +
> > > > +  /* Record GNU2 TLS CALLs for 64-bit:
> > > > +
> > > > +     (set (reg/f:DI 104)
> > > > +         (plus:DI (unspec:DI [
> > > > +                     (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
> > > > +                     (reg:DI 114)
> > > > +                     (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
> > > > +                  (const:DI (unspec:DI [
> > > > +                               (symbol_ref:DI ("e") [flags 0x1a])
> > > > +                            ] UNSPEC_DTPOFF))))
> > > > +
> > > > +     and
> > > > +
> > > > +     (set (reg:DI 101)
> > > > +         (unspec:DI [(symbol_ref:DI ("foo") [flags 0x50])
> > > > +                     (reg:DI 112)
> > > > +                     (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> > > > +
> > > > +   */
> > > > +
> > > > +  rtx src = SET_SRC (set);
> > > > +  if (tls64 == TLS64_CALL)
> > > > +    val = src;
> > > > +  else
> > > > +    {
> > > > +      val = src;
> > > > +      src = XEXP (src, 0);
> > > > +    }
> > > > +
> > >
> > > val = src;
> > > if (tls64 != TLS64_CALL)
> > >   src = XEXP (src, 0);
> >
> > Fixed in v4.
> >
> > >
> > > > +  kind = X86_CSE_TLSDESC;
> > > > +  gcc_assert (GET_CODE (src) == UNSPEC);
> > > > +  src = XVECEXP (src, 0, 1);
> > > > +  scalar_mode = mode = GET_MODE (src);
> > > > +  if (REG_P (src))
> > > > +    {
> > > > +      /* All definitions of reg:DI 129 in
> > > > +
> > > > +        (set (reg:DI 110)
> > > > +             (unspec:DI [(symbol_ref:DI ("foo"))
> > > > +                         (reg:DI 129)
> > > > +                         (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> > > > +
> > > > +        should have the same source as in
> > > > +
> > > > +        (set (reg:DI 129)
> > > > +             (unspec:DI [(symbol_ref:DI ("foo"))] UNSPEC_TLSDESC))
> > > > +
> > > > +       */
> > > > +
> > > > +      df_ref ref;
> > > > +      rtx_insn *set_insn = nullptr;
> > > > +      rtx tls_src = nullptr;
> > > > +      for (ref = DF_REG_DEF_CHAIN (REGNO (src));
> > > > +          ref;
> > > > +          ref = DF_REF_NEXT_REG (ref))
> > >
> > > I think we just need to check if XVECEXP (src, 0, 0) /*
> > > "(symbol_ref:DI ("foo"))" */ is the same since XVECEXP (src, 0, 1) is
> > > always set by XVECEXP (src, 0, 0), according to
> > > "tls_dynamic_gnu2_64_<mode>"
> >
> > v4 is changed to:
> >
> >   rtx tls_symbol = XVECEXP (src, 0, 0);
> >   src = XVECEXP (src, 0, 1);
> > ...
> >           rtx tls_set = PATTERN (set_insn);
> >           rtx tls_src = XVECEXP (SET_SRC (tls_set), 0, 0);
> >           if (!rtx_equal_p (tls_symbol, tls_src))
> >             {
> >               set_insn = nullptr;
> >               break;
> >             }

According to @tls_dynamic_gnu2_64_<mode>, tls_src must be equal to
tls_symbol, Do we really need to go through def-use chain to check
that?
We may record the VAL as tls_symbol.
> >
> > >
> > > > +       {
> > > > +         if (DF_REF_IS_ARTIFICIAL (ref))
> > > > +           break;
> > > > +
> > > > +         set_insn = DF_REF_INSN (ref);
> > > > +         tls64 = get_attr_tls64 (set_insn);
> > > > +         if (tls64 != TLS64_LEA)
> > > > +           {
> > > > +             set_insn = nullptr;
> > > > +             break;
> > > > +           }
> > > > +
> > > > +         rtx tls_set = PATTERN (set_insn);
> > > > +         if (!tls_src)
> > > > +           tls_src = SET_SRC (tls_set);
> > > > +         else if (!rtx_equal_p (tls_src, SET_SRC (tls_set)))
> > > > +           {
> > > > +             set_insn = nullptr;
> > > > +             break;
> > > > +           }
> > > > +       }
> > > > +
> > > > +      if (!set_insn)
> > > > +       return false;
> > > > +
> > > > +      def_insn = set_insn;
> > > > +    }
> > > > +  else if (GET_CODE (src) == UNSPEC
> > > > +          && XINT (src, 1) == UNSPEC_TLSDESC
> > > > +          && SYMBOL_REF_P (XVECEXP (src, 0, 0)))
> > > > +    def_insn = nullptr;
> > >
> > > Similar for here, it's supposed to handle
> > > "*tls_dynamic_gnu2_combine_64_<mode>", according to the splitter
> > > pattern, the output value can also be CSEd with tls64_call when ever
> > > symbol_ref in the second operand of PLUS is the same.
> >
> > v4 is changed to
> >
> >   rtx tls_symbol = XVECEXP (src, 0, 0);
> >   src = XVECEXP (src, 0, 1);
> >   scalar_mode = mode = GET_MODE (src);
> >   gcc_assert (REG_P (src));
>
> Since this triggered the assert on
>
> (set (reg/f:DI 103)
>     (plus:DI (unspec:DI [
>                 (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
>                 (unspec:DI [
>                         (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
>                     ] UNSPEC_TLSDESC)
>                 (reg/f:DI 7 sp)
>             ] UNSPEC_TLSDESC)
>         (const:DI (unspec:DI [
>                     (symbol_ref:DI ("foo") [flags 0x1a] <var_decl 
> 0x7fffe99dbe40 foo>)
>                 ] UNSPEC_DTPOFF))))
>
> I added a testcase and kept the v3 code.

For tls64 combine, tls_symbol should be second operand of the original src, .i.e
         (const:DI (unspec:DI [
                     (symbol_ref:DI ("foo") [flags 0x1a] <var_decl
0x7fffe99dbe40 foo>)  <--- this
                 ] UNSPEC_DTPOFF))))

Not this   (symbol_ref:DI ("_TLS_MODULE_BASE_"), and no need for
gcc_assert (REG_P (src)).

>
> H.J.



-- 
BR,
Hongtao

Reply via email to