On Tue, Jul 29, 2025 at 9:28 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Mon, Jul 28, 2025 at 01:53:08PM -0700, H.J. Lu wrote: > > On Mon, Jul 28, 2025 at 04:51:24PM +0800, Hongtao Liu wrote: > > > On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > For TLS calls: > > > > > > > > 1. UNSPEC_TLS_GD: > > > > > > > > (parallel [ > > > > (set (reg:DI 0 ax) > > > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > > (const_int 0 [0]))) > > > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > > > (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > > > > (clobber (reg:DI 5 di))]) > > > > > > > > 2. UNSPEC_TLS_LD_BASE: > > > > > > > > (parallel [ > > > > (set (reg:DI 0 ax) > > > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > > (const_int 0 [0]))) > > > > (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > > > > > > > 3. UNSPEC_TLSDESC: > > > > > > > > (parallel [ > > > > (set (reg/f:DI 104) > > > > (plus:DI (unspec:DI [ > > > > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags > > > > 0x10]) > > > > (reg:DI 114) > > > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > > > > (const:DI (unspec:DI [ > > > > (symbol_ref:DI ("e") [flags 0x1a]) > > > > ] UNSPEC_DTPOFF)))) > > > > (clobber (reg:CC 17 flags))]) > > > > > > > > (parallel [ > > > > (set (reg:DI 101) > > > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > > > (reg:DI 112) > > > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > > > (clobber (reg:CC 17 flags))]) > > > > > > > > they return the same value for the same input value. But multiple calls > > > > with the same input value may be generated for simple programs like: > > > > > > > > void a(long *); > > > > int b(void); > > > > void c(void); > > > > static __thread long e; > > > > long > > > > d(void) > > > > { > > > > a(&e); > > > > if (b()) > > > > c(); > > > > return e; > > > > } > > > > > > > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are > > > > generated: > > > > > > > > .type d, @function > > > > d: > > > > .LFB0: > > > > .cfi_startproc > > > > pushq %rbx > > > > .cfi_def_cfa_offset 16 > > > > .cfi_offset 3, -16 > > > > leaq e@TLSDESC(%rip), %rbx > > > > movq %rbx, %rax > > > > call *e@TLSCALL(%rax) > > > > addq %fs:0, %rax > > > > movq %rax, %rdi > > > > call a@PLT > > > > call b@PLT > > > > testl %eax, %eax > > > > jne .L8 > > > > movq %rbx, %rax > > > > call *e@TLSCALL(%rax) > > > > popq %rbx > > > > .cfi_remember_state > > > > .cfi_def_cfa_offset 8 > > > > movq %fs:(%rax), %rax > > > > ret > > > > .p2align 4,,10 > > > > .p2align 3 > > > > .L8: > > > > .cfi_restore_state > > > > call c@PLT > > > > movq %rbx, %rax > > > > call *e@TLSCALL(%rax) > > > > popq %rbx > > > > .cfi_def_cfa_offset 8 > > > > movq %fs:(%rax), %rax > > > > ret > > > > .cfi_endproc > > > > > > > > There are 3 "call *e@TLSCALL(%rax)". They all return the same value. > > > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, > > > > extend it to also remove redundant TLS calls to generate: > > > > > > > > d: > > > > .LFB0: > > > > .cfi_startproc > > > > pushq %rbx > > > > .cfi_def_cfa_offset 16 > > > > .cfi_offset 3, -16 > > > > leaq e@TLSDESC(%rip), %rax > > > > movq %fs:0, %rdi > > > > call *e@TLSCALL(%rax) > > > > addq %rax, %rdi > > > > movq %rax, %rbx > > > > call a@PLT > > > > call b@PLT > > > > testl %eax, %eax > > > > jne .L8 > > > > movq %fs:(%rbx), %rax > > > > popq %rbx > > > > .cfi_remember_state > > > > .cfi_def_cfa_offset 8 > > > > ret > > > > .p2align 4,,10 > > > > .p2align 3 > > > > .L8: > > > > .cfi_restore_state > > > > call c@PLT > > > > movq %fs:(%rbx), %rax > > > > popq %rbx > > > > .cfi_def_cfa_offset 8 > > > > ret > > > > .cfi_endproc > > > > > > > > with only one "call *e@TLSCALL(%rax)". This reduces the number of > > > > __tls_get_addr calls in libgcc.a by 72%: > > > > > > > > __tls_get_addr calls before after > > > > libgcc.a 868 243 > > > > > > > > gcc/ > > > > > > > > PR target/81501 > > > > * config/i386/i386-features.cc (x86_cse_kind): Add > > > > X86_CSE_TLS_GD, > > > > X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC. > > > > (redundant_load): Renamed to ... > > > > (redundant_pattern): This. > > > > (replace_tls_call): New. > > > > (ix86_place_single_tls_call): Likewise. > > > > (pass_remove_redundant_vector_load): Renamed to ... > > > > (pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, > > > > kind, > > > > x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and > > > > candidate_vector_p. > > > > (pass_x86_cse::candidate_gnu_tls_p): New. > > > > (pass_x86_cse::candidate_gnu2_tls_p): Likewise. > > > > (pass_x86_cse::candidate_vector_p): Likewise. > > > > (remove_redundant_vector_load): Renamed to ... > > > > (pass_x86_cse::x86_cse): This. Extend to remove redundant TLS > > > > calls. > > > > (make_pass_remove_redundant_vector_load): Renamed to ... > > > > (make_pass_x86_cse): This. > > > > (config/i386/i386-passes.def): Replace > > > > pass_remove_redundant_vector_load with pass_x86_cse. > > > > config/i386/i386-protos.h (ix86_tls_get_addr): New. > > > > (make_pass_remove_redundant_vector_load): Renamed to ... > > > > (make_pass_x86_cse): This. > > > > * config/i386/i386.cc (ix86_tls_get_addr): Remove static. > > > > * config/i386/i386.h (machine_function): Add > > > > tls_descriptor_call_multiple_p. > > > > * config/i386/i386.md (tls64): New attribute. > > > > (@tls_global_dynamic_64_<mode>): Set > > > > tls_descriptor_call_multiple_p. > > > > (@tls_local_dynamic_base_64_<mode>): Likewise. > > > > (@tls_dynamic_gnu2_64_<mode>): Likewise. > > > > (*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd. > > > > (*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to > > > > ld_base. > > > > (*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea. > > > > (*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call. > > > > (*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to > > > > combine. > > > > > > > > gcc/testsuite/ > > > > > > > > PR target/81501 > > > > * g++.target/i386/pr81501-1.C: New test. > > > > * gcc.target/i386/pr81501-1a.c: Likewise. > > > > * gcc.target/i386/pr81501-1b.c: Likewise. > > > > * gcc.target/i386/pr81501-2a.c: Likewise. > > > > * gcc.target/i386/pr81501-2b.c: Likewise. > > > > * gcc.target/i386/pr81501-3.c: Likewise. > > > > * gcc.target/i386/pr81501-4a.c: Likewise. > > > > * gcc.target/i386/pr81501-4b.c: Likewise. > > > > * gcc.target/i386/pr81501-5.c: Likewise. > > > > * gcc.target/i386/pr81501-6a.c: Likewise. > > > > * gcc.target/i386/pr81501-6b.c: Likewise. > > > > * gcc.target/i386/pr81501-7.c: Likewise. > > > > * gcc.target/i386/pr81501-8a.c: Likewise. > > > > * gcc.target/i386/pr81501-8b.c: Likewise. > > > > * gcc.target/i386/pr81501-9a.c: Likewise. > > > > * gcc.target/i386/pr81501-9b.c: Likewise. > > > > > > > > Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > > > > --- > > > > gcc/config/i386/i386-features.cc | 838 +++++++++++++++++---- > > > > gcc/config/i386/i386-passes.def | 2 +- > > > > gcc/config/i386/i386-protos.h | 4 +- > > > > gcc/config/i386/i386.cc | 2 +- > > > > gcc/config/i386/i386.h | 3 + > > > > gcc/config/i386/i386.md | 25 +- > > > > gcc/testsuite/g++.target/i386/pr81501-1.C | 16 + > > > > gcc/testsuite/gcc.target/i386/pr81501-1a.c | 17 + > > > > gcc/testsuite/gcc.target/i386/pr81501-1b.c | 6 + > > > > gcc/testsuite/gcc.target/i386/pr81501-2a.c | 17 + > > > > gcc/testsuite/gcc.target/i386/pr81501-2b.c | 6 + > > > > gcc/testsuite/gcc.target/i386/pr81501-3.c | 9 + > > > > gcc/testsuite/gcc.target/i386/pr81501-4a.c | 51 ++ > > > > gcc/testsuite/gcc.target/i386/pr81501-4b.c | 6 + > > > > gcc/testsuite/gcc.target/i386/pr81501-5.c | 13 + > > > > gcc/testsuite/gcc.target/i386/pr81501-6a.c | 67 ++ > > > > gcc/testsuite/gcc.target/i386/pr81501-6b.c | 28 + > > > > gcc/testsuite/gcc.target/i386/pr81501-7.c | 20 + > > > > gcc/testsuite/gcc.target/i386/pr81501-8a.c | 82 ++ > > > > gcc/testsuite/gcc.target/i386/pr81501-8b.c | 31 + > > > > gcc/testsuite/gcc.target/i386/pr81501-9a.c | 39 + > > > > gcc/testsuite/gcc.target/i386/pr81501-9b.c | 22 + > > > > 22 files changed, 1148 insertions(+), 156 deletions(-) > > > > create mode 100644 gcc/testsuite/g++.target/i386/pr81501-1.C > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-1a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-1b.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-2a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-2b.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-3.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-4a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-4b.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-5.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-6a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-6b.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-7.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-8a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-8b.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-9a.c > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr81501-9b.c > > > > > > > > diff --git a/gcc/config/i386/i386-features.cc > > > > b/gcc/config/i386/i386-features.cc > > > > index c131577805f..80a1e6caa0e 100644 > > > > --- a/gcc/config/i386/i386-features.cc > > > > +++ b/gcc/config/i386/i386-features.cc > > > > @@ -3493,10 +3493,13 @@ enum x86_cse_kind > > > > { > > > > X86_CSE_CONST0_VECTOR, > > > > X86_CSE_CONSTM1_VECTOR, > > > > - X86_CSE_VEC_DUP > > > > + X86_CSE_VEC_DUP, > > > > + X86_CSE_TLS_GD, > > > > + X86_CSE_TLS_LD_BASE, > > > > + X86_CSE_TLSDESC > > > > }; > > > > > > > > -struct redundant_load > > > > +struct redundant_pattern > > > > { > > > > /* Bitmap of basic blocks with broadcast instructions. */ > > > > auto_bitmap bbs; > > > > @@ -3669,22 +3672,541 @@ ix86_broadcast_inner (rtx op, machine_mode > > > > mode, > > > > return op; > > > > } > > > > > > > > -/* At entry of the nearest common dominator for basic blocks with > > > > vector > > > > - CONST0_RTX and integer CONSTM1_RTX uses, generate a single widest > > > > - vector set instruction for all CONST0_RTX and integer CONSTM1_RTX > > > > - uses. > > > > +/* Replace CALL instruction in TLS_CALL_INSNS with SET from SRC. */ > > > > > > > > - NB: We want to generate only a single widest vector set to cover the > > > > - whole function. The LCM algorithm isn't appropriate here since it > > > > - may place a vector set inside the loop. */ > > > > +static void > > > > +replace_tls_call (rtx src, auto_bitmap &tls_call_insns) > > > > +{ > > > > + bitmap_iterator bi; > > > > + unsigned int id; > > > > > > > > -static unsigned int > > > > -remove_redundant_vector_load (void) > > > > + EXECUTE_IF_SET_IN_BITMAP (tls_call_insns, 0, id, bi) > > > > + { > > > > + rtx_insn *insn = DF_INSN_UID_GET (id)->insn; > > > > + > > > > + /* If this isn't a CALL, only GNU2 TLS implicit CALL patterns are > > > > + allowed. */ > > > > > > > + if (!CALL_P (insn)) > > > > + { > > > > + attr_tls64 tls64 = get_attr_tls64 (insn); > > > > + if (tls64 != TLS64_CALL && tls64 != TLS64_COMBINE) > > > > + gcc_unreachable (); > > > > + } > > > > + > > > > + rtx pat = PATTERN (insn); > > > > + if (GET_CODE (pat) != PARALLEL) > > > > + gcc_unreachable (); > > > > + > > > > + int j; > > > > + rtx op, dest = nullptr; > > > > + for (j = XVECLEN (pat, 0) - 1; j >= 0; j--) > > > > > > SET is always at the first of parallel for tls64 > > > "combine/call/ld_base/gd", so no need for the iteration? > > > > > > > Fixed in v4. > > > > > > + { > > > > + op = XVECEXP (pat, 0, j); > > > > + if (GET_CODE (op) == SET) > > > > + { > > > > + dest = SET_DEST (op); > > > > + break; > > > > + } > > > > + } > > > > + > > > > + rtx set = gen_rtx_SET (dest, src); > > > > + rtx_insn *set_insn = emit_insn_after (set, insn); > > > > + if (recog_memoized (set_insn) < 0) > > > > + gcc_unreachable (); > > > > + > > > > + if (dump_file) > > > > + { > > > > + fprintf (dump_file, "\nReplace:\n\n"); > > > > + print_rtl_single (dump_file, insn); > > > > + fprintf (dump_file, "\nwith:\n\n"); > > > > + print_rtl_single (dump_file, set_insn); > > > > + fprintf (dump_file, "\n"); > > > > + } > > > > + > > > > + /* Delete the CALL insn. */ > > > > + delete_insn (insn); > > > > + > > > > + df_insn_rescan (set_insn); > > > > + } > > > > +} > > > > + > > > > +/* Generate a TLS call of KIND with VAL and copy the call result to > > > > DEST, > > > > + at entry of the nearest dominator for basic block map BBS, which is > > > > in > > > > + the fake loop that contains the whole function, so that there is > > > > only > > > > + a single TLS CALL of KIND with VAL in the whole function. If > > > > + TLSDESC_SET isn't nullptr, insert it before the TLS call. */ > > > > + > > > > +static void > > > > +ix86_place_single_tls_call (rtx dest, rtx val, x86_cse_kind kind, > > > > + bitmap bbs, rtx tlsdesc_set = nullptr) > > > > +{ > > > > + basic_block bb = nearest_common_dominator_for_set (CDI_DOMINATORS, > > > > bbs); > > > > + while (bb->loop_father->latch > > > > + != EXIT_BLOCK_PTR_FOR_FN (cfun)) > > > > + bb = get_immediate_dominator (CDI_DOMINATORS, > > > > + bb->loop_father->header); > > > > + > > > > + rtx_insn *insn = BB_HEAD (bb); > > > > + while (insn && !NONDEBUG_INSN_P (insn)) > > > > + { > > > > + if (insn == BB_END (bb)) > > > > + { > > > > + insn = NULL; > > > > + break; > > > > + } > > > > + insn = NEXT_INSN (insn); > > > > + } > > > > + > > > > + rtx rax = nullptr, rdi; > > > > + rtx eqv = nullptr; > > > > + rtx caddr; > > > > + rtx set; > > > > + rtx clob; > > > > + rtx symbol; > > > > + rtx tls; > > > > + rtx_insn *tls_insn; > > > > + > > > > + switch (kind) > > > > + { > > > > + case X86_CSE_TLS_GD: > > > > + rax = gen_rtx_REG (Pmode, AX_REG); > > > > + rdi = gen_rtx_REG (Pmode, DI_REG); > > > > + caddr = ix86_tls_get_addr (); > > > > + > > > > + symbol = XVECEXP (val, 0, 0); > > > > + tls = gen_tls_global_dynamic_64 (Pmode, rax, symbol, caddr, rdi); > > > > + > > > > + if (GET_MODE (symbol) != Pmode) > > > > + symbol = gen_rtx_ZERO_EXTEND (Pmode, symbol); > > > > + eqv = symbol; > > > > + break; > > > > + > > > > + case X86_CSE_TLS_LD_BASE: > > > > + rax = gen_rtx_REG (Pmode, AX_REG); > > > > + rdi = gen_rtx_REG (Pmode, DI_REG); > > > > > > Considering that the pass is before register allocation, if we use a > > > pseudo-register, RA will handle the pattern with clobber rdi/rci/rax > > > by itself. > > > > These patterns take explicit RAX and RDI register operands. Using > > pseudo-registers doesn't work: > > > > /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc > > -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ > > -O2 -fPIC -mx32 -S pr81501-1a.c > > pr81501-1a.c: In function ‘d’: > > pr81501-1a.c:15:1: error: unable to generate reloads for: > > 15 | } > > | ^ > > (call_insn/u 43 2 44 2 (parallel [ > > (set (reg:SI 112) > > (call:SI (mem:QI (symbol_ref:SI ("__tls_get_addr")) [0 S1 > > A8]) > > (const_int 0 [0]))) > > (unspec:SI [ > > (reg/f:SI 7 sp) > > ] UNSPEC_TLS_LD_BASE) > > (clobber (reg:SI 113)) > > ]) 1660 {*tls_local_dynamic_base_64_si} > > (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000]) > > (nil)) > > (nil)) > > during RTL pass: reload > > pr81501-1a.c:15:1: internal compiler error: in curr_insn_transform, at > > lra-constraints.cc:4372 > > > > > > > > > + caddr = ix86_tls_get_addr (); > > > > + > > > > + tls = gen_tls_local_dynamic_base_64 (Pmode, rax, caddr, rdi); > > > > + > > > > + /* Attach a unique REG_EQUAL to DEST, to allow the RTL optimizers > > > > + to share the LD_BASE result with other LD model accesses. */ > > > > + eqv = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), > > > > + UNSPEC_TLS_LD_BASE); > > > > + > > > > + break; > > > > + > > > > + case X86_CSE_TLSDESC: > > > > + set = gen_rtx_SET (dest, val); > > > > + clob = gen_rtx_CLOBBER (VOIDmode, > > > > + gen_rtx_REG (CCmode, FLAGS_REG)); > > > > + tls = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clob)); > > > > + break; > > > > + > > > > + default: > > > > + gcc_unreachable (); > > > > + } > > > > + > > > > + rtx_insn *before = nullptr; > > > > + rtx_insn *after = nullptr; > > > > + if (insn == BB_HEAD (bb)) > > > > + before = insn; > > > > + else > > > > + after = insn ? PREV_INSN (insn) : BB_END (bb); > > > > + > > > > + /* TLS_GD and TLS_LD_BASE instructions are normal functions which > > > > + clobber caller-saved registers. TLSDESC instructions are special > > > > + functions which only clobber RAX. If any registers clobbered by > > > > + the TLS instruction are live in this basic block, we must insert > > > > + the TLS instruction after all live registers clobbered by the TLS > > > > + instruction are dead. */ > > > > + > > > > + auto_bitmap live_caller_saved_regs; > > > > + bitmap in = df_live ? DF_LIVE_IN (bb) : DF_LR_IN (bb); > > > > + > > > > + bool flags_live_p = bitmap_bit_p (in, FLAGS_REG); > > > > + > > > > + unsigned int i; > > > > + > > > > + /* Get all live caller-saved registers. */ > > > > + if (kind == X86_CSE_TLSDESC) > > > > + { > > > > + if (bitmap_bit_p (in, AX_REG)) > > > > + bitmap_set_bit (live_caller_saved_regs, AX_REG); > > > > > > And we don't need to check for those hard registers here and below? > > > > TLS_GD and TLS_LD_BASE instructions are normal functions which > > clobber caller-saved registers. TLSDESC instructions are special > > functions which only clobber RAX. live_caller_saved_regs captures > > live caller-saved registers for these TLS instructions.
I notice those insns are CALL_INSN, and for ABI, rax/rdi/rsi is caller_saved registers, so even we explicitly use (clobber (reg: RAX)) RA will help save and restore the register? > > > > > > > > > + } > > > > + else > > > > + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > > > > + if (call_used_regs[i] > > > > + && !fixed_regs[i] > > > > + && bitmap_bit_p (in, i)) > > > > + bitmap_set_bit (live_caller_saved_regs, i); > > > > + > > > > + if (!bitmap_empty_p (live_caller_saved_regs)) > > > > + { > > > > + /* Search for REG_DEAD notes in this basic block. */ > > > > + FOR_BB_INSNS (bb, insn) > > > > + { > > > > + if (!NONDEBUG_INSN_P (insn)) > > > > + continue; > > > > + > > > > + /* Check if FLAGS register is live. */ > > > > + set = single_set (insn); > > > > + if (set) > > > > + { > > > > + rtx dest = SET_DEST (set); > > > > + if (REG_P (dest) && REGNO (dest) == FLAGS_REG) > > > > + flags_live_p = true; > > > > + } > > > > + > > > > + rtx link; > > > > + for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) > > > > + if (REG_NOTE_KIND (link) == REG_DEAD > > > > + && REG_P (XEXP (link, 0))) > > > > + { > > > > + /* Mark the live caller-saved register as dead. */ > > > > + for (i = REGNO (XEXP (link, 0)); > > > > + i < END_REGNO (XEXP (link, 0)); > > > > + i++) > > > > + bitmap_clear_bit (live_caller_saved_regs, i); > > > > + > > > > + /* Check if FLAGS register is dead. */ > > > > + if (REGNO (XEXP (link, 0)) == FLAGS_REG) > > > > + flags_live_p = false; > > > > + > > > > + if (bitmap_empty_p (live_caller_saved_regs)) > > > > + { > > > > + /* All live caller-saved registers are dead after > > > > + this instruction. Since TLS instructions > > > > + clobber FLAGS register, it must be dead where > > > > + the TLS will be inserted after. */ > > > > + if (flags_live_p) > > > > + gcc_unreachable (); > > > > + after = insn; > > > > + goto insert_after; > > > > + } > > > > + } > > > > + } > > > > + > > > > + /* All live caller-saved registers should be dead at the end > > > > + of this basic block. */ > > > > + gcc_unreachable (); > > > > + } > > > > + > > > > + /* Emit the TLS CALL insn. */ > > > > + if (after) > > > > + { > > > > +insert_after: > > > > + tls_insn = emit_insn_after (tls, after); > > > > + } > > > > + else > > > > + tls_insn = emit_insn_before (tls, before); > > > > + > > > > + rtx_insn *tlsdesc_insn = nullptr; > > > > + if (tlsdesc_set) > > > > + { > > > > + rtx dest = copy_rtx (SET_DEST (tlsdesc_set)); > > > > + rtx src = copy_rtx (SET_SRC (tlsdesc_set)); > > > > + tlsdesc_set = gen_rtx_SET (dest, src); > > > > + tlsdesc_insn = emit_insn_before (tlsdesc_set, tls_insn); > > > > + } > > > > + > > > > + if (kind != X86_CSE_TLSDESC) > > > > + { > > > > + RTL_CONST_CALL_P (tls_insn) = 1; > > > > + > > > > + /* Indicate that this function can't jump to non-local gotos. */ > > > > + make_reg_eh_region_note_nothrow_nononlocal (tls_insn); > > > > + } > > > > + > > > > + if (recog_memoized (tls_insn) < 0) > > > > + gcc_unreachable (); > > > > + > > > > + if (dump_file) > > > > + { > > > > + if (after) > > > > + { > > > > + fprintf (dump_file, "\nPlace:\n\n"); > > > > + if (tlsdesc_insn) > > > > + print_rtl_single (dump_file, tlsdesc_insn); > > > > + print_rtl_single (dump_file, tls_insn); > > > > + fprintf (dump_file, "\nafter:\n\n"); > > > > + print_rtl_single (dump_file, after); > > > > + fprintf (dump_file, "\n"); > > > > + } > > > > + else > > > > + { > > > > + fprintf (dump_file, "\nPlace:\n\n"); > > > > + if (tlsdesc_insn) > > > > + print_rtl_single (dump_file, tlsdesc_insn); > > > > + print_rtl_single (dump_file, tls_insn); > > > > + fprintf (dump_file, "\nbefore:\n\n"); > > > > + print_rtl_single (dump_file, insn); > > > > + fprintf (dump_file, "\n"); > > > > + } > > > > + } > > > > + > > > > + if (kind != X86_CSE_TLSDESC) > > > > + { > > > > + /* Copy RAX to DEST. */ > > > > + set = gen_rtx_SET (dest, rax); > > > > + rtx_insn *set_insn = emit_insn_after (set, tls_insn); > > > > + set_dst_reg_note (set_insn, REG_EQUAL, copy_rtx (eqv), dest); > > > > + if (dump_file) > > > > + { > > > > + fprintf (dump_file, "\nPlace:\n\n"); > > > > + print_rtl_single (dump_file, set_insn); > > > > + fprintf (dump_file, "\nafter:\n\n"); > > > > + print_rtl_single (dump_file, tls_insn); > > > > + fprintf (dump_file, "\n"); > > > > + } > > > > + } > > > > +} > > > > + > > > > +namespace { > > > > + > > > > +const pass_data pass_data_x86_cse = > > > > +{ > > > > + RTL_PASS, /* type */ > > > > + "x86_cse", /* name */ > > > > + OPTGROUP_NONE, /* optinfo_flags */ > > > > + TV_MACH_DEP, /* tv_id */ > > > > + 0, /* properties_required */ > > > > + 0, /* properties_provided */ > > > > + 0, /* properties_destroyed */ > > > > + 0, /* todo_flags_start */ > > > > + 0, /* todo_flags_finish */ > > > > +}; > > > > + > > > > +class pass_x86_cse : public rtl_opt_pass > > > > +{ > > > > +public: > > > > + pass_x86_cse (gcc::context *ctxt) > > > > + : rtl_opt_pass (pass_data_x86_cse, ctxt) > > > > + {} > > > > + > > > > + /* opt_pass methods: */ > > > > + bool gate (function *fun) final override > > > > + { > > > > + return (TARGET_SSE2 > > > > + && optimize > > > > + && optimize_function_for_speed_p (fun)); > > > > + } > > > > + > > > > + unsigned int execute (function *) final override > > > > + { > > > > + return x86_cse (); > > > > + } > > > > + > > > > +private: > > > > + /* The redundant source value. */ > > > > + rtx val; > > > > + /* The instruction which defines the redundant value. */ > > > > + rtx_insn *def_insn; > > > > + /* Mode of the destination of the candidate redundant instruction. > > > > */ > > > > + machine_mode mode; > > > > + /* Mode of the source of the candidate redundant instruction. */ > > > > + machine_mode scalar_mode; > > > > + /* The classification of the candidate redundant instruction. */ > > > > + x86_cse_kind kind; > > > > + > > > > + unsigned int x86_cse (void); > > > > + bool candidate_gnu_tls_p (rtx_insn *, attr_tls64); > > > > + bool candidate_gnu2_tls_p (rtx, attr_tls64); > > > > + bool candidate_vector_p (rtx); > > > > +}; // class pass_x86_cse > > > > + > > > > +/* Return true and output def_insn, val, mode, scalar_mode and kind if > > > > + INSN is UNSPEC_TLS_GD or UNSPEC_TLS_LD_BASE. */ > > > > + > > > > +bool > > > > +pass_x86_cse::candidate_gnu_tls_p (rtx_insn *insn, attr_tls64 tls64) > > > > +{ > > > > + if (!TARGET_64BIT || !cfun->machine->tls_descriptor_call_multiple_p) > > > > + return false; > > > > + > > > > + /* Record the redundant TLS CALLs for 64-bit: > > > > + > > > > + (parallel [ > > > > + (set (reg:DI 0 ax) > > > > + (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > > + (const_int 0 [0]))) > > > > + (unspec:DI [(symbol_ref:DI ("foo") [flags 0x50]) > > > > + (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > > > > + (clobber (reg:DI 5 di))]) > > > > + > > > > + > > > > + and > > > > + > > > > + (parallel [ > > > > + (set (reg:DI 0 ax) > > > > + (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > > > + (const_int 0 [0]))) > > > > + (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > > > + > > > > + */ > > > > + > > > > + rtx pat = PATTERN (insn); > > > > + rtx set = XVECEXP (pat, 0, 0); > > > > + gcc_assert (GET_CODE (set) == SET); > > > > + rtx dest = SET_DEST (set); > > > > + scalar_mode = mode = GET_MODE (dest); > > > > + val = XVECEXP (pat, 0, 1); > > > > + gcc_assert (GET_CODE (val) == UNSPEC); > > > > + > > > > + if (tls64 == TLS64_GD) > > > > + kind = X86_CSE_TLS_GD; > > > > + else > > > > + kind = X86_CSE_TLS_LD_BASE; > > > > + > > > > + def_insn = nullptr; > > > > + return true; > > > > +} > > > > + > > > > +/* Return true and output def_insn, val, mode, scalar_mode and kind if > > > > + SET is UNSPEC_TLSDESC. */ > > > > + > > > > +bool > > > > +pass_x86_cse::candidate_gnu2_tls_p (rtx set, attr_tls64 tls64) > > > > +{ > > > > + if (!TARGET_64BIT || !cfun->machine->tls_descriptor_call_multiple_p) > > > > + return false; > > > > + > > > > + /* Record GNU2 TLS CALLs for 64-bit: > > > > + > > > > + (set (reg/f:DI 104) > > > > + (plus:DI (unspec:DI [ > > > > + (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > > > > + (reg:DI 114) > > > > + (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > > > > + (const:DI (unspec:DI [ > > > > + (symbol_ref:DI ("e") [flags 0x1a]) > > > > + ] UNSPEC_DTPOFF)))) > > > > + > > > > + and > > > > + > > > > + (set (reg:DI 101) > > > > + (unspec:DI [(symbol_ref:DI ("foo") [flags 0x50]) > > > > + (reg:DI 112) > > > > + (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > > > + > > > > + */ > > > > + > > > > + rtx src = SET_SRC (set); > > > > + if (tls64 == TLS64_CALL) > > > > + val = src; > > > > + else > > > > + { > > > > + val = src; > > > > + src = XEXP (src, 0); > > > > + } > > > > + > > > > > > val = src; > > > if (tls64 != TLS64_CALL) > > > src = XEXP (src, 0); > > > > Fixed in v4. > > > > > > > > > + kind = X86_CSE_TLSDESC; > > > > + gcc_assert (GET_CODE (src) == UNSPEC); > > > > + src = XVECEXP (src, 0, 1); > > > > + scalar_mode = mode = GET_MODE (src); > > > > + if (REG_P (src)) > > > > + { > > > > + /* All definitions of reg:DI 129 in > > > > + > > > > + (set (reg:DI 110) > > > > + (unspec:DI [(symbol_ref:DI ("foo")) > > > > + (reg:DI 129) > > > > + (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > > > + > > > > + should have the same source as in > > > > + > > > > + (set (reg:DI 129) > > > > + (unspec:DI [(symbol_ref:DI ("foo"))] UNSPEC_TLSDESC)) > > > > + > > > > + */ > > > > + > > > > + df_ref ref; > > > > + rtx_insn *set_insn = nullptr; > > > > + rtx tls_src = nullptr; > > > > + for (ref = DF_REG_DEF_CHAIN (REGNO (src)); > > > > + ref; > > > > + ref = DF_REF_NEXT_REG (ref)) > > > > > > I think we just need to check if XVECEXP (src, 0, 0) /* > > > "(symbol_ref:DI ("foo"))" */ is the same since XVECEXP (src, 0, 1) is > > > always set by XVECEXP (src, 0, 0), according to > > > "tls_dynamic_gnu2_64_<mode>" > > > > v4 is changed to: > > > > rtx tls_symbol = XVECEXP (src, 0, 0); > > src = XVECEXP (src, 0, 1); > > ... > > rtx tls_set = PATTERN (set_insn); > > rtx tls_src = XVECEXP (SET_SRC (tls_set), 0, 0); > > if (!rtx_equal_p (tls_symbol, tls_src)) > > { > > set_insn = nullptr; > > break; > > } According to @tls_dynamic_gnu2_64_<mode>, tls_src must be equal to tls_symbol, Do we really need to go through def-use chain to check that? We may record the VAL as tls_symbol. > > > > > > > > > + { > > > > + if (DF_REF_IS_ARTIFICIAL (ref)) > > > > + break; > > > > + > > > > + set_insn = DF_REF_INSN (ref); > > > > + tls64 = get_attr_tls64 (set_insn); > > > > + if (tls64 != TLS64_LEA) > > > > + { > > > > + set_insn = nullptr; > > > > + break; > > > > + } > > > > + > > > > + rtx tls_set = PATTERN (set_insn); > > > > + if (!tls_src) > > > > + tls_src = SET_SRC (tls_set); > > > > + else if (!rtx_equal_p (tls_src, SET_SRC (tls_set))) > > > > + { > > > > + set_insn = nullptr; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + if (!set_insn) > > > > + return false; > > > > + > > > > + def_insn = set_insn; > > > > + } > > > > + else if (GET_CODE (src) == UNSPEC > > > > + && XINT (src, 1) == UNSPEC_TLSDESC > > > > + && SYMBOL_REF_P (XVECEXP (src, 0, 0))) > > > > + def_insn = nullptr; > > > > > > Similar for here, it's supposed to handle > > > "*tls_dynamic_gnu2_combine_64_<mode>", according to the splitter > > > pattern, the output value can also be CSEd with tls64_call when ever > > > symbol_ref in the second operand of PLUS is the same. > > > > v4 is changed to > > > > rtx tls_symbol = XVECEXP (src, 0, 0); > > src = XVECEXP (src, 0, 1); > > scalar_mode = mode = GET_MODE (src); > > gcc_assert (REG_P (src)); > > Since this triggered the assert on > > (set (reg/f:DI 103) > (plus:DI (unspec:DI [ > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > (unspec:DI [ > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > ] UNSPEC_TLSDESC) > (reg/f:DI 7 sp) > ] UNSPEC_TLSDESC) > (const:DI (unspec:DI [ > (symbol_ref:DI ("foo") [flags 0x1a] <var_decl > 0x7fffe99dbe40 foo>) > ] UNSPEC_DTPOFF)))) > > I added a testcase and kept the v3 code. For tls64 combine, tls_symbol should be second operand of the original src, .i.e (const:DI (unspec:DI [ (symbol_ref:DI ("foo") [flags 0x1a] <var_decl 0x7fffe99dbe40 foo>) <--- this ] UNSPEC_DTPOFF)))) Not this (symbol_ref:DI ("_TLS_MODULE_BASE_"), and no need for gcc_assert (REG_P (src)). > > H.J. -- BR, Hongtao