Source: gcc-16 Version: 16-20260203-2 Severity: normal Tags: patch User: [email protected] Usertags: sh4 X-Debbugs-Cc: [email protected]
Hello, since some of the patches to add LRA support for the SH backend have now been merged upstream, the two patches sh-lra-support-doc.diff and sh-lra-support.diff no longer apply. I have therefore updated both patches against git master (f845b699558). Please update both in the gcc-16 package and re-enable them. Thanks, Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
>From c2d30267327ee7339214b5bd070ff9aeb37118f6 Mon Sep 17 00:00:00 2001 From: Oleg Endo <[email protected]> Date: Sun, 29 Sep 2024 21:33:29 +0900 Subject: [PATCH] SH: Add support for LRA and enable it by default SH: Tighten memory predicates and constraints In particular, reject invalid hard-regs for memory address registers when using LRA. Unfortunately we need to distingquish between old reload and LRA behaviors for the transitional period. LRA seems to require stricter predicates and constraints. gcc/ChangeLog: PR target/55212 * config/sh/predicates.md (simple_mem_operand): Use 'satisfies_constraint_Sra'. (post_inc_mem, pre_dec_mem): Use 'satisfies_constraint_Rab'. * config/sh/constraints.md (Rab, Rai, Sgb): New constraints. (Sua, Sdd, Snd, Ssd, Sbv, Sra, Ara, Add): Use Rab and Rai constraints. * config/sh/sync.md (atomic_mem_operand_0, atomic_mem_operand_1): Reject GBR addresses when hard-llcs atomic mode is enabled. SH: pin input args to hard-regs via predicates for sfuncs Some sfuncs uses hard reg as input and clobber its raw reg pattern. It seems that LRA doesn't process this clobber pattern. Rewrite these patterns so as to work with LRA. gcc/ChangeLog: * config/sh/predicates.md (hard_reg_r4, hard_reg_r5, hard_reg_r6): New predicates. * config/sh/sh.md (udivsi3_i4, udivsi3_i4_single, udivsi3_i1): Rewrite with match_operand and match_dup. (block_lump_real, block_lump_real_i4): Ditto. (udivsi3): Adjust for it. * config/sh/sh-mem.cc (expand_block_move): Ditto. LRA: Add cannot_substitute_const_equiv_p target hook On SH fp constant load special instructions 'fldi0' and 'fldi1' are only valid for single-precision fp mode and thus depend on mode-switiching. Since LRA is not aware of that it would emit such constant loads in the wrong mode. The new target hook allows rejecting such potentially unsafe substitutions. gcc/ChangeLog: PR target/117182 * target.def (cannot_substitute_const_equiv_p): New target hook. * doc/tm.texi.in: Add it. * lra-constraints.cc (get_equiv): Use it. * config/sh/sh.cc (sh_cannot_substitute_const_equiv_p): Override it. * doc/tm.texi: Re-generate. SH: try to workaround fp-reg related move insns LRA will try to satisfy the constraints in match_scratch for the memory displacements and it will make issues on this target. To mitigate the issue, split movsf_ie_ra into several new patterns to remove match_scratch. Also define a new sub-pattern of movdf for constant loads. gcc/ChangeLog: * gcc/config/sh/predicates.md (pc_relative_load_operand): New predicate. * gcc/config/sh/sh-protos.h (sh_movsf_ie_ra_split_p): Remove. (sh_movsf_ie_y_split_p): New proto. * gcc/config/sh/sh.cc: (sh_movsf_ie_ra_split_p): Remove. (sh_movsf_ie_y_split_p): New function. (broken_move): Take movsf_ie_ra into account for fldi cases. * gcc/config/sh/sh.md (movdf_i4_F_z): New insn pattern. (movdf): Use it. (movsf_ie_ra): Use define_insn instead of define_insn_and_split. (movsf_ie_F_z, movsf_ie_Q_z, movsf_ie_y): New insn pattern. (movsf): Use new patterns. (movsf-1): Don't split when operands[0] or operands[1] is fpul. (movdf_i4_F_z+7): New splitter. SH: Try to workaround fp-reg related move insns pt.2 The current movsf logic for LRA doesn't work well for reg from/to multiword subreg. Use a separate pattern movsf_ie_rffr for that case. Also movsf_ie_ra should be disabled for reg from/to subreg of SImode. If not, it's recognizable as such move when subreg1 pass tries to split multiword because the constraints aren't effective in that stage. gcc/ChangeLog: PR target/55212 * config/sh/sh-protos.h (sh_movsf_ie_subreg_multiword_p): New proto. * config/sh/sh.cc (sh_movsf_ie_subreg_multiword_p): New function. * config/sh/sh.md (movsf_ie_rffr): New insn_and_split. (movsf): Use movsf_ie_rffr when sh_movsf_ie_subreg_multiword_p is true. (movsf_ie_ra): Disable when sh_movsf_ie_y_split_p is true. SH: Try to reduce R0 live ranges Some move or extend patterns will make long R0 live ranges and could confuse LRA. gcc/ChangeLog: * config/sh/sh-protos.h (sh_satisfies_constraint_Sid_subreg_index): Declare. * config/sh/sh.cc (sh_satisfies_constraint_Sid_subreg_index): New function. * config/sh/sh.md (extend<mode>si2_short_mem_disp_z, *mov<mode>_store_mem_index, mov<mode>_store_mem_index): New insn and insn_and_split patterns. (extend<mode>si2, mov<mode>): Use them for LRA. SH: Fix the condition to use movsh_ie_y pattern. gcc/ChangeLog: * config/sh/sh.cc (sh_movsf_ie_y_split_p): Take the subreg of DImode into account. SH: enable LRA by default gcc/ChangeLog: PR target/55212 * conifg/sh/sh.opt (sh_lra_flag): Init to 1. --- gcc/config/sh/constraints.md | 65 +++++-- gcc/config/sh/predicates.md | 37 +++- gcc/config/sh/sh-mem.cc | 4 +- gcc/config/sh/sh-protos.h | 4 +- gcc/config/sh/sh.cc | 79 ++++++-- gcc/config/sh/sh.md | 337 ++++++++++++++++++++++++++++------- gcc/config/sh/sh.opt | 2 +- gcc/config/sh/sync.md | 8 +- gcc/doc/tm.texi | 17 +- gcc/doc/tm.texi.in | 2 + gcc/lra-constraints.cc | 6 +- gcc/target.def | 21 ++- 12 files changed, 464 insertions(+), 118 deletions(-) diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md index 51569cdfd2d..7b476dc6c38 100644 --- a/gcc/config/sh/constraints.md +++ b/gcc/config/sh/constraints.md @@ -45,8 +45,10 @@ ;; H: Floating point 1 ;; Q: pc relative load operand ;; Rxx: reserved for exotic register classes. +;; Rab: address base register +;; Rai: address index register ;; Sxx: extra memory constraints -;; Sua: unaligned memory address +;; Sua: simple or post-inc address (for unaligned load) ;; Sbv: QImode address without displacement ;; Sbw: QImode address with 12 bit displacement ;; Snd: address without displacement @@ -260,16 +262,36 @@ (match_test "~ival == 64") (match_test "~ival == 128")))) +;; FIXME: LRA and reload behavior differs in memory constraint handling. +;; For LRA memory address constraints need to narrow the register type +;; restrictions. It seems the address RTX validation is done slightly +;; differently. Remove the non-LRA paths eventually. +(define_constraint "Rab" + "@internal address base register constraint" + (ior (and (match_test "sh_lra_p ()") + (match_test "MAYBE_BASE_REGISTER_RTX_P (op, false)")) + (and (match_test "!sh_lra_p ()") + (match_code "reg")))) + +(define_constraint "Rai" + "@internal address index register constraint" + (ior (and (match_test "sh_lra_p ()") + (match_test "MAYBE_INDEX_REGISTER_RTX_P (op, false)")) + (and (match_test "!sh_lra_p ()") + (match_code "reg")))) + (define_memory_constraint "Sua" - "@internal" - (and (match_test "memory_operand (op, GET_MODE (op))") - (match_test "GET_CODE (XEXP (op, 0)) != PLUS"))) + "A memory reference that allows simple register or post-inc addressing." + (and (match_code "mem") + (ior (match_test "satisfies_constraint_Rab (XEXP (op, 0))") + (and (match_code "post_inc" "0") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))))) (define_memory_constraint "Sdd" "A memory reference that uses displacement addressing." (and (match_code "mem") (match_code "plus" "0") - (match_code "reg" "00") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") (match_code "const_int" "01"))) (define_memory_constraint "Snd" @@ -281,19 +303,28 @@ "A memory reference that uses index addressing." (and (match_code "mem") (match_code "plus" "0") - (match_code "reg" "00") - (match_code "reg" "01"))) + (ior (and (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") + (match_test "satisfies_constraint_Rai (XEXP (XEXP (op, 0), 1))")) + (and (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 1))") + (match_test "satisfies_constraint_Rai (XEXP (XEXP (op, 0), 0))"))))) (define_memory_constraint "Ssd" "A memory reference that excludes index and displacement addressing." - (and (match_code "mem") - (match_test "! satisfies_constraint_Sid (op)") - (match_test "! satisfies_constraint_Sdd (op)"))) + (ior (and (match_code "mem") + (match_test "! sh_lra_p ()") + (match_test "! satisfies_constraint_Sid (op)") + (match_test "! satisfies_constraint_Sdd (op)")) + (and (match_code "mem") + (match_test "sh_lra_p ()") + (ior (match_test "satisfies_constraint_Rab (XEXP (op, 0))") + (and (ior (match_code "pre_dec" "0") (match_code "post_inc" "0")) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))")))))) (define_memory_constraint "Sbv" "A memory reference, as used in SH2A bclr.b, bset.b, etc." - (and (match_test "MEM_P (op) && GET_MODE (op) == QImode") - (match_test "REG_P (XEXP (op, 0))"))) + (and (match_code "mem") + (match_test "GET_MODE (op) == QImode") + (match_test "satisfies_constraint_Rab (XEXP (op, 0))"))) (define_memory_constraint "Sbw" "A memory reference, as used in SH2A bclr.b, bset.b, etc." @@ -304,13 +335,17 @@ (define_memory_constraint "Sra" "A memory reference that uses simple register addressing." (and (match_code "mem") - (match_code "reg" "0"))) + (match_test "satisfies_constraint_Rab (XEXP (op, 0))"))) + +(define_memory_constraint "Sgb" + "A memory renference that uses GBR addressing." + (match_test "gbr_address_mem (op, GET_MODE (op))")) (define_memory_constraint "Ara" "A memory reference that uses simple register addressing suitable for gusa atomic operations." (and (match_code "mem") - (match_code "reg" "0") + (match_test "satisfies_constraint_Rab (XEXP (op, 0))") (match_test "REGNO (XEXP (op, 0)) != SP_REG"))) (define_memory_constraint "Add" @@ -319,6 +354,6 @@ (and (match_code "mem") (match_test "GET_MODE (op) == SImode") (match_code "plus" "0") - (match_code "reg" "00") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") (match_code "const_int" "01") (match_test "REGNO (XEXP (XEXP (op, 0), 0)) != SP_REG"))) diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md index f7051087213..868fdb9d57f 100644 --- a/gcc/config/sh/predicates.md +++ b/gcc/config/sh/predicates.md @@ -208,8 +208,7 @@ ;; Returns 1 if OP is a simple register address. (define_predicate "simple_mem_operand" (and (match_code "mem") - (match_code "reg" "0") - (match_test "arith_reg_operand (XEXP (op, 0), SImode)"))) + (match_test "satisfies_constraint_Sra (op)"))) ;; Returns 1 if OP is a valid displacement address. (define_predicate "displacement_mem_operand" @@ -239,13 +238,13 @@ (define_predicate "post_inc_mem" (and (match_code "mem") (match_code "post_inc" "0") - (match_code "reg" "00"))) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))) ;; Returns true if OP is a pre-decrement addressing mode memory reference. (define_predicate "pre_dec_mem" (and (match_code "mem") (match_code "pre_dec" "0") - (match_code "reg" "00"))) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))) ;; Returns 1 if the operand can be used in an SH2A movu.{b|w} insn. (define_predicate "zero_extend_movu_operand" @@ -485,6 +484,17 @@ && sh_legitimate_index_p (mode, XEXP (plus0_rtx, 1), TARGET_SH2A, true); }) +;; Returns true if OP is a pc relative load operand. +(define_predicate "pc_relative_load_operand" + (match_code "mem") +{ + if (GET_MODE (op) != QImode + && IS_PC_RELATIVE_LOAD_ADDR_P (XEXP (op, 0))) + return true; + + return false; +}) + ;; Returns true if OP is a valid source operand for a logical operation. (define_predicate "logical_operand" (and (match_code "subreg,reg,const_int") @@ -805,3 +815,22 @@ return false; }) + +;; Predicats for the arguments of sfunc R4, R5 and R6. +(define_predicate "hard_reg_r4" + (match_code "reg") +{ + return REGNO (op) == R4_REG; +}) + +(define_predicate "hard_reg_r5" + (match_code "reg") +{ + return REGNO (op) == R5_REG; +}) + +(define_predicate "hard_reg_r6" + (match_code "reg") +{ + return REGNO (op) == R6_REG; +}) diff --git a/gcc/config/sh/sh-mem.cc b/gcc/config/sh/sh-mem.cc index 980b55a3a6c..bda7ff3d9da 100644 --- a/gcc/config/sh/sh-mem.cc +++ b/gcc/config/sh/sh-mem.cc @@ -134,7 +134,7 @@ expand_block_move (rtx *operands) int dwords = bytes >> 3; emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1))); - emit_insn (gen_block_lump_real_i4 (func_addr_rtx, lab)); + emit_insn (gen_block_lump_real_i4 (func_addr_rtx, lab, r4, r5, r6)); return true; } else @@ -178,7 +178,7 @@ expand_block_move (rtx *operands) final_switch = 16 - ((bytes / 4) % 16); while_loop = ((bytes / 4) / 16 - 1) * 16; emit_insn (gen_move_insn (r6, GEN_INT (while_loop + final_switch))); - emit_insn (gen_block_lump_real (func_addr_rtx, lab)); + emit_insn (gen_block_lump_real (func_addr_rtx, lab, r4, r5, r6)); return true; } diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h index 41ab6101ae1..a2a374f5f31 100644 --- a/gcc/config/sh/sh-protos.h +++ b/gcc/config/sh/sh-protos.h @@ -61,6 +61,7 @@ extern rtx legitimize_pic_address (rtx, machine_mode, rtx); extern bool nonpic_symbol_mentioned_p (rtx); extern void output_pic_addr_const (FILE *, rtx); extern bool expand_block_move (rtx *); +extern bool sh_satisfies_constraint_Sid_subreg_index (rtx); extern void prepare_move_operands (rtx[], machine_mode mode); extern bool sh_expand_cmpstr (rtx *); extern bool sh_expand_cmpnstr (rtx *); @@ -102,7 +103,8 @@ extern rtx sh_find_equiv_gbr_addr (rtx_insn* cur_insn, rtx mem); extern int sh_eval_treg_value (rtx op); extern HOST_WIDE_INT sh_disp_addr_displacement (rtx mem_op); extern int sh_max_mov_insn_displacement (machine_mode mode, bool consider_sh2a); -extern bool sh_movsf_ie_ra_split_p (rtx, rtx, rtx); +extern bool sh_movsf_ie_y_split_p (rtx, rtx); +extern bool sh_movsf_ie_subreg_multiword_p (rtx, rtx); extern void sh_expand_sym_label2reg (rtx, rtx, rtx, bool); /* Result value of sh_find_set_of_reg. */ diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc index d9b319c2377..4c42abe21d3 100644 --- a/gcc/config/sh/sh.cc +++ b/gcc/config/sh/sh.cc @@ -271,6 +271,7 @@ static bool sh_legitimate_address_p (machine_mode, rtx, bool, static rtx sh_legitimize_address (rtx, rtx, machine_mode); static rtx sh_delegitimize_address (rtx); static bool sh_cannot_substitute_mem_equiv_p (rtx); +static bool sh_cannot_substitute_const_equiv_p (rtx); static bool sh_legitimize_address_displacement (rtx *, rtx *, poly_int64, machine_mode); static int scavenge_reg (HARD_REG_SET *s); @@ -612,6 +613,9 @@ TARGET_GNU_ATTRIBUTES (sh_attribute_table, #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P sh_cannot_substitute_mem_equiv_p +#undef TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P +#define TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P sh_cannot_substitute_const_equiv_p + #undef TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT #define TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT \ sh_legitimize_address_displacement @@ -1580,6 +1584,18 @@ sh_encode_section_info (tree decl, rtx rtl, int first) SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FUNCVEC_FUNCTION; } +/* Test Sid constraint with subreg index. See also the comment in + prepare_move_operands. */ +bool +sh_satisfies_constraint_Sid_subreg_index (rtx op) +{ + return ((GET_CODE (op) == MEM) + && ((GET_CODE (XEXP (op, 0)) == PLUS) + && ((GET_CODE (XEXP (XEXP (op, 0), 0)) == REG) + && ((GET_CODE (XEXP (XEXP (op, 0), 1)) == SUBREG) + && (GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == REG))))); +} + /* Prepare operands for a move define_expand; specifically, one of the operands must be in a register. */ void @@ -4829,6 +4845,7 @@ broken_move (rtx_insn *insn) we changed this to do a constant load. In that case we don't have an r0 clobber, hence we must use fldi. */ && (TARGET_FMOVD + || sh_lra_p () || (GET_CODE (XEXP (XVECEXP (PATTERN (insn), 0, 2), 0)) == SCRATCH)) && REG_P (SET_DEST (pat)) @@ -11431,6 +11448,19 @@ sh_cannot_substitute_mem_equiv_p (rtx) return true; } +static bool +sh_cannot_substitute_const_equiv_p (rtx subst) +{ + /* If SUBST is SFmode const_double 0 or 1, the move insn may be + transformed into fldi0/1. This is unsafe for fp mode switching + because fldi0/1 are single mode only instructions. */ + if (GET_MODE (subst) == SFmode + && (real_equal (CONST_DOUBLE_REAL_VALUE (subst), &dconst1) + || real_equal (CONST_DOUBLE_REAL_VALUE (subst), &dconst0))) + return true; + return false; +} + /* Implement TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT. */ static bool sh_legitimize_address_displacement (rtx *offset1, rtx *offset2, @@ -11452,30 +11482,41 @@ sh_legitimize_address_displacement (rtx *offset1, rtx *offset2, return false; } -/* Return true if movsf insn should be splited with an additional - register. */ +/* Return true if movsf insn should be splited with fpul register. */ bool -sh_movsf_ie_ra_split_p (rtx op0, rtx op1, rtx op2) +sh_movsf_ie_y_split_p (rtx op0, rtx op1) { - /* op0 == op1 */ - if (rtx_equal_p (op0, op1)) + /* f, r */ + if (REG_P (op0) + && (SUBREG_P (op1) + && (GET_MODE (SUBREG_REG (op1)) == SImode + || GET_MODE (SUBREG_REG (op1)) == DImode))) return true; - /* fy, FQ, reg */ - if (GET_CODE (op1) == CONST_DOUBLE - && ! satisfies_constraint_G (op1) - && ! satisfies_constraint_H (op1) - && REG_P (op0) - && REG_P (op2)) + /* r, f */ + if (REG_P (op1) + && (SUBREG_P (op0) + && (GET_MODE (SUBREG_REG (op0)) == SImode + || GET_MODE (SUBREG_REG (op0)) == DImode))) return true; - /* f, r, y */ - if (REG_P (op0) && FP_REGISTER_P (REGNO (op0)) - && REG_P (op1) && GENERAL_REGISTER_P (REGNO (op1)) - && REG_P (op2) && (REGNO (op2) == FPUL_REG)) + + return false; +} + +/* Return true if it moves reg from/to subreg of multiword mode. */ +bool +sh_movsf_ie_subreg_multiword_p (rtx op0, rtx op1) +{ + if (REG_P (op0) + && (SUBREG_P (op1) + && (GET_MODE (SUBREG_REG (op1)) == SCmode + || GET_MODE (SUBREG_REG (op1)) == DImode + || GET_MODE (SUBREG_REG (op1)) == TImode))) return true; - /* r, f, y */ - if (REG_P (op1) && FP_REGISTER_P (REGNO (op1)) - && REG_P (op0) && GENERAL_REGISTER_P (REGNO (op0)) - && REG_P (op2) && (REGNO (op2) == FPUL_REG)) + if (REG_P (op1) + && (SUBREG_P (op0) + && (GET_MODE (SUBREG_REG (op0)) == SCmode + || GET_MODE (SUBREG_REG (op0)) == DImode + || GET_MODE (SUBREG_REG (op0)) == TImode))) return true; return false; diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md index 75ec87b8851..6d311f7a426 100644 --- a/gcc/config/sh/sh.md +++ b/gcc/config/sh/sh.md @@ -2194,13 +2194,24 @@ ;; there is nothing to prevent reload from using r0 to reload the address. ;; This reload would clobber the value in r0 we are trying to store. ;; If we let reload allocate r0, then this problem can never happen. +;; +;; In addition to that, we also must pin the input regs to hard-regs via the +;; predicates. When these insns are instantiated it also emits the +;; accompanying mov insns to load the hard-regs. However, subsequent RTL +;; passes might move things around and reassign the operands to pseudo regs +;; which might get allocated to different (wrong) hard-regs eventually. To +;; avoid that, only allow matching these insns if the operands are the +;; expected hard-regs. (define_insn "udivsi3_i1" [(set (match_operand:SI 0 "register_operand" "=z,z") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) + (clobber (match_dup 3)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl"))] "TARGET_SH1 && TARGET_DIVIDE_CALL_DIV1" @@ -2212,7 +2223,8 @@ (define_insn "udivsi3_i4" [(set (match_operand:SI 0 "register_operand" "=y,y") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:DF DR0_REG)) @@ -2220,9 +2232,11 @@ (clobber (reg:DF DR4_REG)) (clobber (reg:SI R0_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) (clobber (reg:SI FPSCR_STAT_REG)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl")) (use (reg:SI FPSCR_MODES_REG))] @@ -2236,7 +2250,8 @@ (define_insn "udivsi3_i4_single" [(set (match_operand:SI 0 "register_operand" "=y,y") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:DF DR0_REG)) @@ -2244,8 +2259,10 @@ (clobber (reg:DF DR4_REG)) (clobber (reg:SI R0_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl"))] "TARGET_FPU_ANY && TARGET_FPU_SINGLE" @@ -2278,6 +2295,8 @@ { rtx last; rtx func_ptr = gen_reg_rtx (Pmode); + rtx r4 = gen_rtx_REG (SImode, R4_REG); + rtx r5 = gen_rtx_REG (SImode, R5_REG); /* Emit the move of the address to a pseudo outside of the libcall. */ if (TARGET_DIVIDE_CALL_TABLE) @@ -2305,9 +2324,9 @@ { rtx lab = function_symbol (func_ptr, "__udivsi3_i4", SFUNC_STATIC).lab; if (TARGET_FPU_SINGLE) - last = gen_udivsi3_i4_single (operands[0], func_ptr, lab); + last = gen_udivsi3_i4_single (operands[0], func_ptr, lab, r4, r5); else - last = gen_udivsi3_i4 (operands[0], func_ptr, lab); + last = gen_udivsi3_i4 (operands[0], func_ptr, lab, r4, r5); } else if (TARGET_SH2A) { @@ -2319,10 +2338,10 @@ else { rtx lab = function_symbol (func_ptr, "__udivsi3", SFUNC_STATIC).lab; - last = gen_udivsi3_i1 (operands[0], func_ptr, lab); + last = gen_udivsi3_i1 (operands[0], func_ptr, lab, r4, r5); } - emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); - emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]); + emit_move_insn (r4, operands[1]); + emit_move_insn (r5, operands[2]); emit_insn (last); DONE; }) @@ -4820,7 +4839,38 @@ (define_expand "extend<mode>si2" [(set (match_operand:SI 0 "arith_reg_dest") - (sign_extend:SI (match_operand:QIHI 1 "general_extend_operand")))]) + (sign_extend:SI (match_operand:QIHI 1 "general_extend_operand")))] + "" +{ + /* When the displacement addressing is used, RA will assign r0 to + the pseudo register operand for the QI/HImode load. See + the comment in sh.cc:prepare_move_operand and PR target/55212. */ + if (! lra_in_progress && ! reload_completed + && sh_lra_p () + && ! TARGET_SH2A + && arith_reg_dest (operands[0], <MODE>mode) + && short_displacement_mem_operand (operands[1], <MODE>mode)) + { + emit_insn (gen_extend<mode>si2_short_mem_disp_z (operands[0], + operands[1])); + DONE; + } +}) + +(define_insn_and_split "extend<mode>si2_short_mem_disp_z" + [(set (match_operand:SI 0 "arith_reg_dest" "=r") + (sign_extend:SI + (match_operand:QIHI 1 "short_displacement_mem_operand" "m"))) + (clobber (reg:SI R0_REG))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p ()" + "#" + "&& 1" + [(set (match_dup 2) (sign_extend:SI (match_dup 1))) + (set (match_dup 0) (match_dup 2))] +{ + operands[2] = gen_rtx_REG (SImode, R0_REG); +} + [(set_attr "type" "load")]) (define_insn_and_split "*extend<mode>si2_compact_reg" [(set (match_operand:SI 0 "arith_reg_dest" "=r") @@ -5362,9 +5412,50 @@ operands[1] = gen_lowpart (<MODE>mode, reg); } + if (! lra_in_progress && ! reload_completed + && sh_lra_p () + && ! TARGET_SH2A + && arith_reg_operand (operands[1], <MODE>mode) + && (satisfies_constraint_Sid (operands[0]) + || sh_satisfies_constraint_Sid_subreg_index (operands[0]))) + { + rtx adr = XEXP (operands[0], 0); + rtx base = XEXP (adr, 0); + rtx idx = XEXP (adr, 1); + emit_insn (gen_mov<mode>_store_mem_index (base, idx, + operands[1])); + DONE; + } + prepare_move_operands (operands, <MODE>mode); }) +(define_insn "*mov<mode>_store_mem_index" + [(set (mem:QIHI + (plus:SI (match_operand:SI 0 "arith_reg_operand" "%r") + (match_operand:SI 1 "arith_reg_operand" "z"))) + (match_operand:QIHI 2 "arith_reg_operand" "r"))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p () + && REG_P (operands[1]) && REGNO (operands[1]) == R0_REG" + "mov.<bw> %2,@(%1,%0)" + [(set_attr "type" "store")]) + +(define_insn_and_split "mov<mode>_store_mem_index" + [(set (mem:QIHI + (plus:SI (match_operand:SI 0 "arith_reg_operand" "%r") + (match_operand:SI 1 "arith_reg_operand" "^zr"))) + (match_operand:QIHI 2 "arith_reg_operand" "r")) + (clobber (reg:SI R0_REG))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p ()" + "#" + "&& 1" + [(set (match_dup 3) (match_dup 1)) + (set (mem:QIHI (plus:SI (match_dup 0) (match_dup 3))) (match_dup 2))] +{ + operands[3] = gen_rtx_REG (SImode, R0_REG); +} + [(set_attr "type" "store")]) + ;; The pre-dec and post-inc mems must be captured by the '<' and '>' ;; constraints, otherwise wrong code might get generated. (define_insn "*mov<mode>_load_predec" @@ -5650,6 +5741,22 @@ (const_string "double") (const_string "none")))]) +;; LRA will try to satisfy the constraints in match_scratch for the memory +;; displacements and it will make issues on this target. Use R0 as a scratch +;; register for the constant load. +(define_insn "movdf_i4_F_z" + [(set (match_operand:DF 0 "fp_arith_reg_operand" "=d") + (match_operand:DF 1 "const_double_operand" "F")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_FPU_DOUBLE && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set (attr "length") (if_then_else (eq_attr "fmovd" "yes") (const_int 4) (const_int 8))) + (set (attr "fp_mode") (if_then_else (eq_attr "fmovd" "yes") + (const_string "double") + (const_string "none")))]) + ;; Moving DFmode between fp/general registers through memory ;; (the top of the stack) is faster than moving through fpul even for ;; little endian. Because the type of an instruction is important for its @@ -5789,6 +5896,15 @@ [(set (match_dup 0) (match_dup 0))] "") +(define_split + [(set (match_operand:SF 0 "register_operand" "") + (match_operand:SF 1 "register_operand" "")) + (use (reg:SI FPSCR_MODES_REG))] + "TARGET_SH2E && sh_lra_p () && reload_completed + && true_regnum (operands[0]) == true_regnum (operands[1])" + [(set (match_dup 0) (match_dup 0))] + "") + ;; fmovd substitute post-reload splits (define_split [(set (match_operand:DF 0 "register_operand" "") @@ -6033,6 +6149,14 @@ prepare_move_operands (operands, DFmode); if (TARGET_FPU_DOUBLE) { + if (sh_lra_p () + && (GET_CODE (operands[1]) == CONST_DOUBLE + && REG_P (operands[0]))) + { + emit_insn (gen_movdf_i4_F_z (operands[0], operands[1])); + DONE; + } + emit_insn (gen_movdf_i4 (operands[0], operands[1])); DONE; } @@ -6158,15 +6282,17 @@ (const_string "none") (const_string "none")])]) -(define_insn_and_split "movsf_ie_ra" +;; LRA will try to satisfy the constraints in match_scratch for the memory +;; displacements and it will make issues on this target. movsf_ie is splitted +;; into 4 patterns to avoid it when lra_in_progress is true. +(define_insn "movsf_ie_ra" [(set (match_operand:SF 0 "general_movdst_operand" - "=f,r,f,f,fy,f,m, r,r,m,f,y,y,rf,r,y,<,y,y") + "=f,r,f,f,f,m, r,r,m,f,y,y,r,y,<,y,y") (match_operand:SF 1 "general_movsrc_operand" - " f,r,G,H,FQ,m,f,FQ,m,r,y,f,>,fr,y,r,y,>,y")) - (use (reg:SI FPSCR_MODES_REG)) - (clobber (match_scratch:SF 2 "=r,r,X,X,&z,r,r, X,r,r,r,r,r, y,r,r,r,r,r")) - (const_int 0)] - "TARGET_SH2E + " f,r,G,H,m,f,FQ,m,r,y,f,>,y,r,y,>,y")) + (use (reg:SI FPSCR_MODES_REG))] + "TARGET_SH2E && sh_lra_p () + && ! sh_movsf_ie_y_split_p (operands[0], operands[1]) && (arith_reg_operand (operands[0], SFmode) || fpul_operand (operands[0], SFmode) || arith_reg_operand (operands[1], SFmode) @@ -6176,7 +6302,6 @@ mov %1,%0 fldi0 %0 fldi1 %0 - # fmov.s %1,%0 fmov.s %1,%0 mov.l %1,%0 @@ -6185,31 +6310,19 @@ fsts fpul,%0 flds %1,fpul lds.l %1,%0 - # sts %1,%0 lds %1,%0 sts.l %1,%0 lds.l %1,%0 ! move optimized away" - "reload_completed - && sh_movsf_ie_ra_split_p (operands[0], operands[1], operands[2])" - [(const_int 0)] -{ - if (! rtx_equal_p (operands[0], operands[1])) - { - emit_insn (gen_movsf_ie (operands[2], operands[1])); - emit_insn (gen_movsf_ie (operands[0], operands[2])); - } -} - [(set_attr "type" "fmove,move,fmove,fmove,pcfload,fload,fstore,pcload,load, - store,fmove,fmove,load,*,fpul_gp,gp_fpul,fstore,load,nil") - (set_attr "late_fp_use" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,yes,*,yes,*,*") + [(set_attr "type" "fmove,move,fmove,fmove,fload,fstore,pcload,load, + store,fmove,fmove,load,fpul_gp,gp_fpul,fstore,load,nil") + (set_attr "late_fp_use" "*,*,*,*,*,yes,*,*,*,*,*,*,yes,*,yes,*,*") (set_attr_alternative "length" [(const_int 2) (const_int 2) (const_int 2) (const_int 2) - (const_int 4) (if_then_else (match_operand 1 "displacement_mem_operand") (const_int 4) (const_int 2)) (if_then_else (match_operand 0 "displacement_mem_operand") @@ -6222,7 +6335,6 @@ (const_int 2) (const_int 2) (const_int 2) - (const_int 4) (const_int 2) (const_int 2) (const_int 2) @@ -6234,7 +6346,6 @@ (const_string "none") (const_string "single") (const_string "single") - (const_string "none") (if_then_else (eq_attr "fmovd" "yes") (const_string "single") (const_string "none")) (if_then_else (eq_attr "fmovd" "yes") @@ -6249,15 +6360,75 @@ (const_string "none") (const_string "none") (const_string "none") + (const_string "none")])]) + +(define_insn_and_split "movsf_ie_rffr" + [(set (match_operand:SF 0 "arith_reg_dest" "=f,r,rf") + (match_operand:SF 1 "arith_reg_operand" "f,r,fr")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (match_scratch:SF 2 "=X,X,y"))] + "TARGET_SH2E && sh_lra_p ()" + "@ + fmov %1,%0 + mov %1,%0 + #" + "reload_completed + && (FP_REGISTER_P (REGNO (operands[0])) + != FP_REGISTER_P (REGNO (operands[1])))" + [(const_int 0)] +{ + emit_insn (gen_movsf_ie_ra (operands[2], operands[1])); + emit_insn (gen_movsf_ie_ra (operands[0], operands[2])); +} + [(set_attr "type" "fmove,move,*") + (set_attr_alternative "length" + [(const_int 2) + (const_int 2) + (const_int 4)]) + (set_attr_alternative "fp_mode" + [(if_then_else (eq_attr "fmovd" "yes") + (const_string "single") (const_string "none")) (const_string "none") (const_string "none")])]) +(define_insn "movsf_ie_F_z" + [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f") + (match_operand:SF 1 "const_double_operand" "F")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set_attr "length" "4")]) + +(define_insn "movsf_ie_Q_z" + [(set (match_operand:SF 0 "fpul_operand" "=y") + (match_operand:SF 1 "pc_relative_load_operand" "Q")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set_attr "length" "4")]) + +(define_insn "movsf_ie_y" + [(set (match_operand:SF 0 "arith_reg_dest" "=fr") + (match_operand:SF 1 "arith_reg_operand" "rf")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI FPUL_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "*") + (set_attr "length" "4")]) + (define_split [(set (match_operand:SF 0 "register_operand" "") (match_operand:SF 1 "register_operand" "")) (use (reg:SI FPSCR_MODES_REG)) (clobber (reg:SI FPUL_REG))] - "TARGET_SH1" + "TARGET_SH1 + && ! fpul_operand (operands[0], SFmode) + && ! fpul_operand (operands[1], SFmode)" [(parallel [(set (reg:SF FPUL_REG) (match_dup 1)) (use (reg:SI FPSCR_MODES_REG)) (clobber (scratch:SI))]) @@ -6274,11 +6445,37 @@ prepare_move_operands (operands, SFmode); if (TARGET_SH2E) { - if (lra_in_progress) + if (sh_lra_p ()) { if (GET_CODE (operands[0]) == SCRATCH) DONE; - emit_insn (gen_movsf_ie_ra (operands[0], operands[1])); + /* reg from/to multiword subreg may be splitted to several reg from/to + subreg of SImode by subreg1 pass. This confuses our splitted + movsf logic for LRA and will end up in bad code or ICE. Use a special + pattern so that LRA can optimize this case. */ + if (! lra_in_progress && ! reload_completed + && sh_movsf_ie_subreg_multiword_p (operands[0], operands[1])) + { + emit_insn (gen_movsf_ie_rffr (operands[0], operands[1])); + DONE; + } + if (GET_CODE (operands[1]) == CONST_DOUBLE + && ! satisfies_constraint_G (operands[1]) + && ! satisfies_constraint_H (operands[1]) + && REG_P (operands[0])) + emit_insn (gen_movsf_ie_F_z (operands[0], operands[1])); + else if ((REG_P (operands[0]) && REGNO (operands[0]) == FPUL_REG) + && satisfies_constraint_Q (operands[1])) + emit_insn (gen_movsf_ie_Q_z (operands[0], operands[1])); + else if (sh_movsf_ie_y_split_p (operands[0], operands[1])) + { + if (lra_in_progress) + emit_insn (gen_movsf_ie (operands[0], operands[1])); + else + emit_insn (gen_movsf_ie_y (operands[0], operands[1])); + } + else + emit_insn (gen_movsf_ie_ra (operands[0], operands[1])); DONE; } @@ -8970,17 +9167,20 @@ (set_attr "needs_delay_slot" "yes")]) (define_insn "block_lump_real" - [(parallel [(set (mem:BLK (reg:SI R4_REG)) - (mem:BLK (reg:SI R5_REG))) - (use (match_operand:SI 0 "arith_reg_operand" "r,r")) - (use (match_operand 1 "" "Z,Ccl")) - (use (reg:SI R6_REG)) - (clobber (reg:SI PR_REG)) - (clobber (reg:SI T_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) - (clobber (reg:SI R6_REG)) - (clobber (reg:SI R0_REG))])] + [(set (mem:BLK (match_operand:SI 2 "hard_reg_r4" "=r,r")) + (mem:BLK (match_operand:SI 3 "hard_reg_r5" "=r,r"))) + (use (match_operand:SI 0 "arith_reg_operand" "r,r")) + (use (match_operand 1 "" "Z,Ccl")) + (use (match_operand:SI 4 "hard_reg_r6" "=r,r")) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) + (use (reg:SI R6_REG)) + (clobber (match_dup 2)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (clobber (reg:SI PR_REG)) + (clobber (reg:SI T_REG)) + (clobber (reg:SI R0_REG))] "TARGET_SH1 && ! TARGET_HARD_SH4" "@ jsr @%0%# @@ -9005,20 +9205,23 @@ (set_attr "needs_delay_slot" "yes")]) (define_insn "block_lump_real_i4" - [(parallel [(set (mem:BLK (reg:SI R4_REG)) - (mem:BLK (reg:SI R5_REG))) - (use (match_operand:SI 0 "arith_reg_operand" "r,r")) - (use (match_operand 1 "" "Z,Ccl")) - (use (reg:SI R6_REG)) - (clobber (reg:SI PR_REG)) - (clobber (reg:SI T_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) - (clobber (reg:SI R6_REG)) - (clobber (reg:SI R0_REG)) - (clobber (reg:SI R1_REG)) - (clobber (reg:SI R2_REG)) - (clobber (reg:SI R3_REG))])] + [(set (mem:BLK (match_operand:SI 2 "hard_reg_r4" "=r,r")) + (mem:BLK (match_operand:SI 3 "hard_reg_r5" "=r,r"))) + (use (match_operand:SI 0 "arith_reg_operand" "r,r")) + (use (match_operand 1 "" "Z,Ccl")) + (use (match_operand:SI 4 "hard_reg_r6" "=r,r")) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) + (use (reg:SI R6_REG)) + (clobber (match_dup 2)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (clobber (reg:SI PR_REG)) + (clobber (reg:SI T_REG)) + (clobber (reg:SI R0_REG)) + (clobber (reg:SI R1_REG)) + (clobber (reg:SI R2_REG)) + (clobber (reg:SI R3_REG))] "TARGET_HARD_SH4" "@ jsr @%0%# diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt index 1ef494d4df4..e5ab6c42a0c 100644 --- a/gcc/config/sh/sh.opt +++ b/gcc/config/sh/sh.opt @@ -299,5 +299,5 @@ Target Var(TARGET_FSRRA) Enable the use of the fsrra instruction. mlra -Target Var(sh_lra_flag) Init(0) Save +Target Var(sh_lra_flag) Init(1) Save Use LRA instead of reload (transitional). diff --git a/gcc/config/sh/sync.md b/gcc/config/sh/sync.md index 2eca3accbc8..c89e6e919bb 100644 --- a/gcc/config/sh/sync.md +++ b/gcc/config/sh/sync.md @@ -217,7 +217,9 @@ (and (match_test "mode == SImode") (and (match_test "!TARGET_ATOMIC_HARD_LLCS") (match_test "!TARGET_SH4A || TARGET_ATOMIC_STRICT")) - (match_operand 0 "short_displacement_mem_operand"))))) + (match_operand 0 "short_displacement_mem_operand"))) + (ior (match_test "!TARGET_ATOMIC_HARD_LLCS") + (not (match_operand 0 "gbr_address_mem"))))) (define_expand "atomic_compare_and_swap<mode>" [(match_operand:SI 0 "arith_reg_dest") ;; bool success output @@ -715,7 +717,9 @@ && TARGET_SH4A && !TARGET_ATOMIC_STRICT && mode != SImode")) (ior (match_operand 0 "short_displacement_mem_operand") - (match_operand 0 "gbr_address_mem")))))) + (match_operand 0 "gbr_address_mem")))) + (ior (match_test "!TARGET_ATOMIC_HARD_LLCS") + (not (match_operand 0 "gbr_address_mem"))))) (define_expand "atomic_fetch_<fetchop_name><mode>" [(set (match_operand:QIHISI 0 "arith_reg_dest") diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index a25487be299..e1ff662c1d5 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -554,7 +554,11 @@ get_equiv (rtx x) return res; } if ((res = ira_reg_equiv[regno].constant) != NULL_RTX) - return res; + { + if (targetm.cannot_substitute_const_equiv_p (res)) + return x; + return res; + } if ((res = ira_reg_equiv[regno].invariant) != NULL_RTX) return res; gcc_unreachable (); diff --git a/gcc/target.def b/gcc/target.def index 206c94f8749..4411d4f810a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6240,9 +6240,24 @@ DEFHOOK substitute safely pseudos with equivalent memory values during\n\ register allocation.\n\ The default version of this target hook returns @code{false}.\n\ -On most machines, this default should be used. For generally\n\ -machines with non orthogonal register usage for addressing, such\n\ -as SH, this hook can be used to avoid excessive spilling.", +On most machines, this default should be used. For machines with\n\ +non-orthogonal register usage for addressing, such as SH,\n\ +this hook can be used to avoid excessive spilling.", + bool, (rtx subst), + hook_bool_rtx_false) + +/* This target hook allows the backend to avoid unsafe substitution + during register allocation. */ +DEFHOOK +(cannot_substitute_const_equiv_p, + "A target hook which returns @code{true} if @var{subst} can't\n\ +substitute safely pseudos with equivalent constant values during\n\ +register allocation.\n\ +The default version of this target hook returns @code{false}.\n\ +On most machines, this default should be used. For machines with\n\ +special constant load instructions that have additional constraints\n\ +or being dependent on mode-switching, such as SH, this hook can be\n\ +used to avoid unsafe substitution.", bool, (rtx subst), hook_bool_rtx_false) -- 2.51.0
>From c2d30267327ee7339214b5bd070ff9aeb37118f6 Mon Sep 17 00:00:00 2001 From: Oleg Endo <[email protected]> Date: Sun, 29 Sep 2024 21:33:29 +0900 Subject: [PATCH] SH: Add support for LRA and enable it by default SH: Tighten memory predicates and constraints In particular, reject invalid hard-regs for memory address registers when using LRA. Unfortunately we need to distingquish between old reload and LRA behaviors for the transitional period. LRA seems to require stricter predicates and constraints. gcc/ChangeLog: PR target/55212 * config/sh/predicates.md (simple_mem_operand): Use 'satisfies_constraint_Sra'. (post_inc_mem, pre_dec_mem): Use 'satisfies_constraint_Rab'. * config/sh/constraints.md (Rab, Rai, Sgb): New constraints. (Sua, Sdd, Snd, Ssd, Sbv, Sra, Ara, Add): Use Rab and Rai constraints. * config/sh/sync.md (atomic_mem_operand_0, atomic_mem_operand_1): Reject GBR addresses when hard-llcs atomic mode is enabled. SH: pin input args to hard-regs via predicates for sfuncs Some sfuncs uses hard reg as input and clobber its raw reg pattern. It seems that LRA doesn't process this clobber pattern. Rewrite these patterns so as to work with LRA. gcc/ChangeLog: * config/sh/predicates.md (hard_reg_r4, hard_reg_r5, hard_reg_r6): New predicates. * config/sh/sh.md (udivsi3_i4, udivsi3_i4_single, udivsi3_i1): Rewrite with match_operand and match_dup. (block_lump_real, block_lump_real_i4): Ditto. (udivsi3): Adjust for it. * config/sh/sh-mem.cc (expand_block_move): Ditto. LRA: Add cannot_substitute_const_equiv_p target hook On SH fp constant load special instructions 'fldi0' and 'fldi1' are only valid for single-precision fp mode and thus depend on mode-switiching. Since LRA is not aware of that it would emit such constant loads in the wrong mode. The new target hook allows rejecting such potentially unsafe substitutions. gcc/ChangeLog: PR target/117182 * target.def (cannot_substitute_const_equiv_p): New target hook. * doc/tm.texi.in: Add it. * lra-constraints.cc (get_equiv): Use it. * config/sh/sh.cc (sh_cannot_substitute_const_equiv_p): Override it. * doc/tm.texi: Re-generate. SH: try to workaround fp-reg related move insns LRA will try to satisfy the constraints in match_scratch for the memory displacements and it will make issues on this target. To mitigate the issue, split movsf_ie_ra into several new patterns to remove match_scratch. Also define a new sub-pattern of movdf for constant loads. gcc/ChangeLog: * gcc/config/sh/predicates.md (pc_relative_load_operand): New predicate. * gcc/config/sh/sh-protos.h (sh_movsf_ie_ra_split_p): Remove. (sh_movsf_ie_y_split_p): New proto. * gcc/config/sh/sh.cc: (sh_movsf_ie_ra_split_p): Remove. (sh_movsf_ie_y_split_p): New function. (broken_move): Take movsf_ie_ra into account for fldi cases. * gcc/config/sh/sh.md (movdf_i4_F_z): New insn pattern. (movdf): Use it. (movsf_ie_ra): Use define_insn instead of define_insn_and_split. (movsf_ie_F_z, movsf_ie_Q_z, movsf_ie_y): New insn pattern. (movsf): Use new patterns. (movsf-1): Don't split when operands[0] or operands[1] is fpul. (movdf_i4_F_z+7): New splitter. SH: Try to workaround fp-reg related move insns pt.2 The current movsf logic for LRA doesn't work well for reg from/to multiword subreg. Use a separate pattern movsf_ie_rffr for that case. Also movsf_ie_ra should be disabled for reg from/to subreg of SImode. If not, it's recognizable as such move when subreg1 pass tries to split multiword because the constraints aren't effective in that stage. gcc/ChangeLog: PR target/55212 * config/sh/sh-protos.h (sh_movsf_ie_subreg_multiword_p): New proto. * config/sh/sh.cc (sh_movsf_ie_subreg_multiword_p): New function. * config/sh/sh.md (movsf_ie_rffr): New insn_and_split. (movsf): Use movsf_ie_rffr when sh_movsf_ie_subreg_multiword_p is true. (movsf_ie_ra): Disable when sh_movsf_ie_y_split_p is true. SH: Try to reduce R0 live ranges Some move or extend patterns will make long R0 live ranges and could confuse LRA. gcc/ChangeLog: * config/sh/sh-protos.h (sh_satisfies_constraint_Sid_subreg_index): Declare. * config/sh/sh.cc (sh_satisfies_constraint_Sid_subreg_index): New function. * config/sh/sh.md (extend<mode>si2_short_mem_disp_z, *mov<mode>_store_mem_index, mov<mode>_store_mem_index): New insn and insn_and_split patterns. (extend<mode>si2, mov<mode>): Use them for LRA. SH: Fix the condition to use movsh_ie_y pattern. gcc/ChangeLog: * config/sh/sh.cc (sh_movsf_ie_y_split_p): Take the subreg of DImode into account. SH: enable LRA by default gcc/ChangeLog: PR target/55212 * conifg/sh/sh.opt (sh_lra_flag): Init to 1. --- gcc/config/sh/constraints.md | 65 +++++-- gcc/config/sh/predicates.md | 37 +++- gcc/config/sh/sh-mem.cc | 4 +- gcc/config/sh/sh-protos.h | 4 +- gcc/config/sh/sh.cc | 79 ++++++-- gcc/config/sh/sh.md | 337 ++++++++++++++++++++++++++++------- gcc/config/sh/sh.opt | 2 +- gcc/config/sh/sync.md | 8 +- gcc/doc/tm.texi | 17 +- gcc/doc/tm.texi.in | 2 + gcc/lra-constraints.cc | 6 +- gcc/target.def | 21 ++- 12 files changed, 464 insertions(+), 118 deletions(-) diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md index 51569cdfd2d..7b476dc6c38 100644 --- a/gcc/config/sh/constraints.md +++ b/gcc/config/sh/constraints.md @@ -45,8 +45,10 @@ ;; H: Floating point 1 ;; Q: pc relative load operand ;; Rxx: reserved for exotic register classes. +;; Rab: address base register +;; Rai: address index register ;; Sxx: extra memory constraints -;; Sua: unaligned memory address +;; Sua: simple or post-inc address (for unaligned load) ;; Sbv: QImode address without displacement ;; Sbw: QImode address with 12 bit displacement ;; Snd: address without displacement @@ -260,16 +262,36 @@ (match_test "~ival == 64") (match_test "~ival == 128")))) +;; FIXME: LRA and reload behavior differs in memory constraint handling. +;; For LRA memory address constraints need to narrow the register type +;; restrictions. It seems the address RTX validation is done slightly +;; differently. Remove the non-LRA paths eventually. +(define_constraint "Rab" + "@internal address base register constraint" + (ior (and (match_test "sh_lra_p ()") + (match_test "MAYBE_BASE_REGISTER_RTX_P (op, false)")) + (and (match_test "!sh_lra_p ()") + (match_code "reg")))) + +(define_constraint "Rai" + "@internal address index register constraint" + (ior (and (match_test "sh_lra_p ()") + (match_test "MAYBE_INDEX_REGISTER_RTX_P (op, false)")) + (and (match_test "!sh_lra_p ()") + (match_code "reg")))) + (define_memory_constraint "Sua" - "@internal" - (and (match_test "memory_operand (op, GET_MODE (op))") - (match_test "GET_CODE (XEXP (op, 0)) != PLUS"))) + "A memory reference that allows simple register or post-inc addressing." + (and (match_code "mem") + (ior (match_test "satisfies_constraint_Rab (XEXP (op, 0))") + (and (match_code "post_inc" "0") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))))) (define_memory_constraint "Sdd" "A memory reference that uses displacement addressing." (and (match_code "mem") (match_code "plus" "0") - (match_code "reg" "00") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") (match_code "const_int" "01"))) (define_memory_constraint "Snd" @@ -281,19 +303,28 @@ "A memory reference that uses index addressing." (and (match_code "mem") (match_code "plus" "0") - (match_code "reg" "00") - (match_code "reg" "01"))) + (ior (and (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") + (match_test "satisfies_constraint_Rai (XEXP (XEXP (op, 0), 1))")) + (and (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 1))") + (match_test "satisfies_constraint_Rai (XEXP (XEXP (op, 0), 0))"))))) (define_memory_constraint "Ssd" "A memory reference that excludes index and displacement addressing." - (and (match_code "mem") - (match_test "! satisfies_constraint_Sid (op)") - (match_test "! satisfies_constraint_Sdd (op)"))) + (ior (and (match_code "mem") + (match_test "! sh_lra_p ()") + (match_test "! satisfies_constraint_Sid (op)") + (match_test "! satisfies_constraint_Sdd (op)")) + (and (match_code "mem") + (match_test "sh_lra_p ()") + (ior (match_test "satisfies_constraint_Rab (XEXP (op, 0))") + (and (ior (match_code "pre_dec" "0") (match_code "post_inc" "0")) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))")))))) (define_memory_constraint "Sbv" "A memory reference, as used in SH2A bclr.b, bset.b, etc." - (and (match_test "MEM_P (op) && GET_MODE (op) == QImode") - (match_test "REG_P (XEXP (op, 0))"))) + (and (match_code "mem") + (match_test "GET_MODE (op) == QImode") + (match_test "satisfies_constraint_Rab (XEXP (op, 0))"))) (define_memory_constraint "Sbw" "A memory reference, as used in SH2A bclr.b, bset.b, etc." @@ -304,13 +335,17 @@ (define_memory_constraint "Sra" "A memory reference that uses simple register addressing." (and (match_code "mem") - (match_code "reg" "0"))) + (match_test "satisfies_constraint_Rab (XEXP (op, 0))"))) + +(define_memory_constraint "Sgb" + "A memory renference that uses GBR addressing." + (match_test "gbr_address_mem (op, GET_MODE (op))")) (define_memory_constraint "Ara" "A memory reference that uses simple register addressing suitable for gusa atomic operations." (and (match_code "mem") - (match_code "reg" "0") + (match_test "satisfies_constraint_Rab (XEXP (op, 0))") (match_test "REGNO (XEXP (op, 0)) != SP_REG"))) (define_memory_constraint "Add" @@ -319,6 +354,6 @@ (and (match_code "mem") (match_test "GET_MODE (op) == SImode") (match_code "plus" "0") - (match_code "reg" "00") + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))") (match_code "const_int" "01") (match_test "REGNO (XEXP (XEXP (op, 0), 0)) != SP_REG"))) diff --git a/gcc/config/sh/predicates.md b/gcc/config/sh/predicates.md index f7051087213..868fdb9d57f 100644 --- a/gcc/config/sh/predicates.md +++ b/gcc/config/sh/predicates.md @@ -208,8 +208,7 @@ ;; Returns 1 if OP is a simple register address. (define_predicate "simple_mem_operand" (and (match_code "mem") - (match_code "reg" "0") - (match_test "arith_reg_operand (XEXP (op, 0), SImode)"))) + (match_test "satisfies_constraint_Sra (op)"))) ;; Returns 1 if OP is a valid displacement address. (define_predicate "displacement_mem_operand" @@ -239,13 +238,13 @@ (define_predicate "post_inc_mem" (and (match_code "mem") (match_code "post_inc" "0") - (match_code "reg" "00"))) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))) ;; Returns true if OP is a pre-decrement addressing mode memory reference. (define_predicate "pre_dec_mem" (and (match_code "mem") (match_code "pre_dec" "0") - (match_code "reg" "00"))) + (match_test "satisfies_constraint_Rab (XEXP (XEXP (op, 0), 0))"))) ;; Returns 1 if the operand can be used in an SH2A movu.{b|w} insn. (define_predicate "zero_extend_movu_operand" @@ -485,6 +484,17 @@ && sh_legitimate_index_p (mode, XEXP (plus0_rtx, 1), TARGET_SH2A, true); }) +;; Returns true if OP is a pc relative load operand. +(define_predicate "pc_relative_load_operand" + (match_code "mem") +{ + if (GET_MODE (op) != QImode + && IS_PC_RELATIVE_LOAD_ADDR_P (XEXP (op, 0))) + return true; + + return false; +}) + ;; Returns true if OP is a valid source operand for a logical operation. (define_predicate "logical_operand" (and (match_code "subreg,reg,const_int") @@ -805,3 +815,22 @@ return false; }) + +;; Predicats for the arguments of sfunc R4, R5 and R6. +(define_predicate "hard_reg_r4" + (match_code "reg") +{ + return REGNO (op) == R4_REG; +}) + +(define_predicate "hard_reg_r5" + (match_code "reg") +{ + return REGNO (op) == R5_REG; +}) + +(define_predicate "hard_reg_r6" + (match_code "reg") +{ + return REGNO (op) == R6_REG; +}) diff --git a/gcc/config/sh/sh-mem.cc b/gcc/config/sh/sh-mem.cc index 980b55a3a6c..bda7ff3d9da 100644 --- a/gcc/config/sh/sh-mem.cc +++ b/gcc/config/sh/sh-mem.cc @@ -134,7 +134,7 @@ expand_block_move (rtx *operands) int dwords = bytes >> 3; emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1))); - emit_insn (gen_block_lump_real_i4 (func_addr_rtx, lab)); + emit_insn (gen_block_lump_real_i4 (func_addr_rtx, lab, r4, r5, r6)); return true; } else @@ -178,7 +178,7 @@ expand_block_move (rtx *operands) final_switch = 16 - ((bytes / 4) % 16); while_loop = ((bytes / 4) / 16 - 1) * 16; emit_insn (gen_move_insn (r6, GEN_INT (while_loop + final_switch))); - emit_insn (gen_block_lump_real (func_addr_rtx, lab)); + emit_insn (gen_block_lump_real (func_addr_rtx, lab, r4, r5, r6)); return true; } diff --git a/gcc/config/sh/sh-protos.h b/gcc/config/sh/sh-protos.h index 41ab6101ae1..a2a374f5f31 100644 --- a/gcc/config/sh/sh-protos.h +++ b/gcc/config/sh/sh-protos.h @@ -61,6 +61,7 @@ extern rtx legitimize_pic_address (rtx, machine_mode, rtx); extern bool nonpic_symbol_mentioned_p (rtx); extern void output_pic_addr_const (FILE *, rtx); extern bool expand_block_move (rtx *); +extern bool sh_satisfies_constraint_Sid_subreg_index (rtx); extern void prepare_move_operands (rtx[], machine_mode mode); extern bool sh_expand_cmpstr (rtx *); extern bool sh_expand_cmpnstr (rtx *); @@ -102,7 +103,8 @@ extern rtx sh_find_equiv_gbr_addr (rtx_insn* cur_insn, rtx mem); extern int sh_eval_treg_value (rtx op); extern HOST_WIDE_INT sh_disp_addr_displacement (rtx mem_op); extern int sh_max_mov_insn_displacement (machine_mode mode, bool consider_sh2a); -extern bool sh_movsf_ie_ra_split_p (rtx, rtx, rtx); +extern bool sh_movsf_ie_y_split_p (rtx, rtx); +extern bool sh_movsf_ie_subreg_multiword_p (rtx, rtx); extern void sh_expand_sym_label2reg (rtx, rtx, rtx, bool); /* Result value of sh_find_set_of_reg. */ diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc index d9b319c2377..4c42abe21d3 100644 --- a/gcc/config/sh/sh.cc +++ b/gcc/config/sh/sh.cc @@ -271,6 +271,7 @@ static bool sh_legitimate_address_p (machine_mode, rtx, bool, static rtx sh_legitimize_address (rtx, rtx, machine_mode); static rtx sh_delegitimize_address (rtx); static bool sh_cannot_substitute_mem_equiv_p (rtx); +static bool sh_cannot_substitute_const_equiv_p (rtx); static bool sh_legitimize_address_displacement (rtx *, rtx *, poly_int64, machine_mode); static int scavenge_reg (HARD_REG_SET *s); @@ -612,6 +613,9 @@ TARGET_GNU_ATTRIBUTES (sh_attribute_table, #undef TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P #define TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P sh_cannot_substitute_mem_equiv_p +#undef TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P +#define TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P sh_cannot_substitute_const_equiv_p + #undef TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT #define TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT \ sh_legitimize_address_displacement @@ -1580,6 +1584,18 @@ sh_encode_section_info (tree decl, rtx rtl, int first) SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FUNCVEC_FUNCTION; } +/* Test Sid constraint with subreg index. See also the comment in + prepare_move_operands. */ +bool +sh_satisfies_constraint_Sid_subreg_index (rtx op) +{ + return ((GET_CODE (op) == MEM) + && ((GET_CODE (XEXP (op, 0)) == PLUS) + && ((GET_CODE (XEXP (XEXP (op, 0), 0)) == REG) + && ((GET_CODE (XEXP (XEXP (op, 0), 1)) == SUBREG) + && (GET_CODE (XEXP (XEXP (XEXP (op, 0), 1), 0)) == REG))))); +} + /* Prepare operands for a move define_expand; specifically, one of the operands must be in a register. */ void @@ -4829,6 +4845,7 @@ broken_move (rtx_insn *insn) we changed this to do a constant load. In that case we don't have an r0 clobber, hence we must use fldi. */ && (TARGET_FMOVD + || sh_lra_p () || (GET_CODE (XEXP (XVECEXP (PATTERN (insn), 0, 2), 0)) == SCRATCH)) && REG_P (SET_DEST (pat)) @@ -11431,6 +11448,19 @@ sh_cannot_substitute_mem_equiv_p (rtx) return true; } +static bool +sh_cannot_substitute_const_equiv_p (rtx subst) +{ + /* If SUBST is SFmode const_double 0 or 1, the move insn may be + transformed into fldi0/1. This is unsafe for fp mode switching + because fldi0/1 are single mode only instructions. */ + if (GET_MODE (subst) == SFmode + && (real_equal (CONST_DOUBLE_REAL_VALUE (subst), &dconst1) + || real_equal (CONST_DOUBLE_REAL_VALUE (subst), &dconst0))) + return true; + return false; +} + /* Implement TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT. */ static bool sh_legitimize_address_displacement (rtx *offset1, rtx *offset2, @@ -11452,30 +11482,41 @@ sh_legitimize_address_displacement (rtx *offset1, rtx *offset2, return false; } -/* Return true if movsf insn should be splited with an additional - register. */ +/* Return true if movsf insn should be splited with fpul register. */ bool -sh_movsf_ie_ra_split_p (rtx op0, rtx op1, rtx op2) +sh_movsf_ie_y_split_p (rtx op0, rtx op1) { - /* op0 == op1 */ - if (rtx_equal_p (op0, op1)) + /* f, r */ + if (REG_P (op0) + && (SUBREG_P (op1) + && (GET_MODE (SUBREG_REG (op1)) == SImode + || GET_MODE (SUBREG_REG (op1)) == DImode))) return true; - /* fy, FQ, reg */ - if (GET_CODE (op1) == CONST_DOUBLE - && ! satisfies_constraint_G (op1) - && ! satisfies_constraint_H (op1) - && REG_P (op0) - && REG_P (op2)) + /* r, f */ + if (REG_P (op1) + && (SUBREG_P (op0) + && (GET_MODE (SUBREG_REG (op0)) == SImode + || GET_MODE (SUBREG_REG (op0)) == DImode))) return true; - /* f, r, y */ - if (REG_P (op0) && FP_REGISTER_P (REGNO (op0)) - && REG_P (op1) && GENERAL_REGISTER_P (REGNO (op1)) - && REG_P (op2) && (REGNO (op2) == FPUL_REG)) + + return false; +} + +/* Return true if it moves reg from/to subreg of multiword mode. */ +bool +sh_movsf_ie_subreg_multiword_p (rtx op0, rtx op1) +{ + if (REG_P (op0) + && (SUBREG_P (op1) + && (GET_MODE (SUBREG_REG (op1)) == SCmode + || GET_MODE (SUBREG_REG (op1)) == DImode + || GET_MODE (SUBREG_REG (op1)) == TImode))) return true; - /* r, f, y */ - if (REG_P (op1) && FP_REGISTER_P (REGNO (op1)) - && REG_P (op0) && GENERAL_REGISTER_P (REGNO (op0)) - && REG_P (op2) && (REGNO (op2) == FPUL_REG)) + if (REG_P (op1) + && (SUBREG_P (op0) + && (GET_MODE (SUBREG_REG (op0)) == SCmode + || GET_MODE (SUBREG_REG (op0)) == DImode + || GET_MODE (SUBREG_REG (op0)) == TImode))) return true; return false; diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md index 75ec87b8851..6d311f7a426 100644 --- a/gcc/config/sh/sh.md +++ b/gcc/config/sh/sh.md @@ -2194,13 +2194,24 @@ ;; there is nothing to prevent reload from using r0 to reload the address. ;; This reload would clobber the value in r0 we are trying to store. ;; If we let reload allocate r0, then this problem can never happen. +;; +;; In addition to that, we also must pin the input regs to hard-regs via the +;; predicates. When these insns are instantiated it also emits the +;; accompanying mov insns to load the hard-regs. However, subsequent RTL +;; passes might move things around and reassign the operands to pseudo regs +;; which might get allocated to different (wrong) hard-regs eventually. To +;; avoid that, only allow matching these insns if the operands are the +;; expected hard-regs. (define_insn "udivsi3_i1" [(set (match_operand:SI 0 "register_operand" "=z,z") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) + (clobber (match_dup 3)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl"))] "TARGET_SH1 && TARGET_DIVIDE_CALL_DIV1" @@ -2212,7 +2223,8 @@ (define_insn "udivsi3_i4" [(set (match_operand:SI 0 "register_operand" "=y,y") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:DF DR0_REG)) @@ -2220,9 +2232,11 @@ (clobber (reg:DF DR4_REG)) (clobber (reg:SI R0_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) (clobber (reg:SI FPSCR_STAT_REG)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl")) (use (reg:SI FPSCR_MODES_REG))] @@ -2236,7 +2250,8 @@ (define_insn "udivsi3_i4_single" [(set (match_operand:SI 0 "register_operand" "=y,y") - (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) + (udiv:SI (match_operand:SI 3 "hard_reg_r4" "=r,r") + (match_operand:SI 4 "hard_reg_r5" "=r,r"))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:DF DR0_REG)) @@ -2244,8 +2259,10 @@ (clobber (reg:DF DR4_REG)) (clobber (reg:SI R0_REG)) (clobber (reg:SI R1_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r,r")) (use (match_operand 2 "" "Z,Ccl"))] "TARGET_FPU_ANY && TARGET_FPU_SINGLE" @@ -2278,6 +2295,8 @@ { rtx last; rtx func_ptr = gen_reg_rtx (Pmode); + rtx r4 = gen_rtx_REG (SImode, R4_REG); + rtx r5 = gen_rtx_REG (SImode, R5_REG); /* Emit the move of the address to a pseudo outside of the libcall. */ if (TARGET_DIVIDE_CALL_TABLE) @@ -2305,9 +2324,9 @@ { rtx lab = function_symbol (func_ptr, "__udivsi3_i4", SFUNC_STATIC).lab; if (TARGET_FPU_SINGLE) - last = gen_udivsi3_i4_single (operands[0], func_ptr, lab); + last = gen_udivsi3_i4_single (operands[0], func_ptr, lab, r4, r5); else - last = gen_udivsi3_i4 (operands[0], func_ptr, lab); + last = gen_udivsi3_i4 (operands[0], func_ptr, lab, r4, r5); } else if (TARGET_SH2A) { @@ -2319,10 +2338,10 @@ else { rtx lab = function_symbol (func_ptr, "__udivsi3", SFUNC_STATIC).lab; - last = gen_udivsi3_i1 (operands[0], func_ptr, lab); + last = gen_udivsi3_i1 (operands[0], func_ptr, lab, r4, r5); } - emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); - emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]); + emit_move_insn (r4, operands[1]); + emit_move_insn (r5, operands[2]); emit_insn (last); DONE; }) @@ -4820,7 +4839,38 @@ (define_expand "extend<mode>si2" [(set (match_operand:SI 0 "arith_reg_dest") - (sign_extend:SI (match_operand:QIHI 1 "general_extend_operand")))]) + (sign_extend:SI (match_operand:QIHI 1 "general_extend_operand")))] + "" +{ + /* When the displacement addressing is used, RA will assign r0 to + the pseudo register operand for the QI/HImode load. See + the comment in sh.cc:prepare_move_operand and PR target/55212. */ + if (! lra_in_progress && ! reload_completed + && sh_lra_p () + && ! TARGET_SH2A + && arith_reg_dest (operands[0], <MODE>mode) + && short_displacement_mem_operand (operands[1], <MODE>mode)) + { + emit_insn (gen_extend<mode>si2_short_mem_disp_z (operands[0], + operands[1])); + DONE; + } +}) + +(define_insn_and_split "extend<mode>si2_short_mem_disp_z" + [(set (match_operand:SI 0 "arith_reg_dest" "=r") + (sign_extend:SI + (match_operand:QIHI 1 "short_displacement_mem_operand" "m"))) + (clobber (reg:SI R0_REG))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p ()" + "#" + "&& 1" + [(set (match_dup 2) (sign_extend:SI (match_dup 1))) + (set (match_dup 0) (match_dup 2))] +{ + operands[2] = gen_rtx_REG (SImode, R0_REG); +} + [(set_attr "type" "load")]) (define_insn_and_split "*extend<mode>si2_compact_reg" [(set (match_operand:SI 0 "arith_reg_dest" "=r") @@ -5362,9 +5412,50 @@ operands[1] = gen_lowpart (<MODE>mode, reg); } + if (! lra_in_progress && ! reload_completed + && sh_lra_p () + && ! TARGET_SH2A + && arith_reg_operand (operands[1], <MODE>mode) + && (satisfies_constraint_Sid (operands[0]) + || sh_satisfies_constraint_Sid_subreg_index (operands[0]))) + { + rtx adr = XEXP (operands[0], 0); + rtx base = XEXP (adr, 0); + rtx idx = XEXP (adr, 1); + emit_insn (gen_mov<mode>_store_mem_index (base, idx, + operands[1])); + DONE; + } + prepare_move_operands (operands, <MODE>mode); }) +(define_insn "*mov<mode>_store_mem_index" + [(set (mem:QIHI + (plus:SI (match_operand:SI 0 "arith_reg_operand" "%r") + (match_operand:SI 1 "arith_reg_operand" "z"))) + (match_operand:QIHI 2 "arith_reg_operand" "r"))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p () + && REG_P (operands[1]) && REGNO (operands[1]) == R0_REG" + "mov.<bw> %2,@(%1,%0)" + [(set_attr "type" "store")]) + +(define_insn_and_split "mov<mode>_store_mem_index" + [(set (mem:QIHI + (plus:SI (match_operand:SI 0 "arith_reg_operand" "%r") + (match_operand:SI 1 "arith_reg_operand" "^zr"))) + (match_operand:QIHI 2 "arith_reg_operand" "r")) + (clobber (reg:SI R0_REG))] + "TARGET_SH1 && ! TARGET_SH2A && sh_lra_p ()" + "#" + "&& 1" + [(set (match_dup 3) (match_dup 1)) + (set (mem:QIHI (plus:SI (match_dup 0) (match_dup 3))) (match_dup 2))] +{ + operands[3] = gen_rtx_REG (SImode, R0_REG); +} + [(set_attr "type" "store")]) + ;; The pre-dec and post-inc mems must be captured by the '<' and '>' ;; constraints, otherwise wrong code might get generated. (define_insn "*mov<mode>_load_predec" @@ -5650,6 +5741,22 @@ (const_string "double") (const_string "none")))]) +;; LRA will try to satisfy the constraints in match_scratch for the memory +;; displacements and it will make issues on this target. Use R0 as a scratch +;; register for the constant load. +(define_insn "movdf_i4_F_z" + [(set (match_operand:DF 0 "fp_arith_reg_operand" "=d") + (match_operand:DF 1 "const_double_operand" "F")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_FPU_DOUBLE && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set (attr "length") (if_then_else (eq_attr "fmovd" "yes") (const_int 4) (const_int 8))) + (set (attr "fp_mode") (if_then_else (eq_attr "fmovd" "yes") + (const_string "double") + (const_string "none")))]) + ;; Moving DFmode between fp/general registers through memory ;; (the top of the stack) is faster than moving through fpul even for ;; little endian. Because the type of an instruction is important for its @@ -5789,6 +5896,15 @@ [(set (match_dup 0) (match_dup 0))] "") +(define_split + [(set (match_operand:SF 0 "register_operand" "") + (match_operand:SF 1 "register_operand" "")) + (use (reg:SI FPSCR_MODES_REG))] + "TARGET_SH2E && sh_lra_p () && reload_completed + && true_regnum (operands[0]) == true_regnum (operands[1])" + [(set (match_dup 0) (match_dup 0))] + "") + ;; fmovd substitute post-reload splits (define_split [(set (match_operand:DF 0 "register_operand" "") @@ -6033,6 +6149,14 @@ prepare_move_operands (operands, DFmode); if (TARGET_FPU_DOUBLE) { + if (sh_lra_p () + && (GET_CODE (operands[1]) == CONST_DOUBLE + && REG_P (operands[0]))) + { + emit_insn (gen_movdf_i4_F_z (operands[0], operands[1])); + DONE; + } + emit_insn (gen_movdf_i4 (operands[0], operands[1])); DONE; } @@ -6158,15 +6282,17 @@ (const_string "none") (const_string "none")])]) -(define_insn_and_split "movsf_ie_ra" +;; LRA will try to satisfy the constraints in match_scratch for the memory +;; displacements and it will make issues on this target. movsf_ie is splitted +;; into 4 patterns to avoid it when lra_in_progress is true. +(define_insn "movsf_ie_ra" [(set (match_operand:SF 0 "general_movdst_operand" - "=f,r,f,f,fy,f,m, r,r,m,f,y,y,rf,r,y,<,y,y") + "=f,r,f,f,f,m, r,r,m,f,y,y,r,y,<,y,y") (match_operand:SF 1 "general_movsrc_operand" - " f,r,G,H,FQ,m,f,FQ,m,r,y,f,>,fr,y,r,y,>,y")) - (use (reg:SI FPSCR_MODES_REG)) - (clobber (match_scratch:SF 2 "=r,r,X,X,&z,r,r, X,r,r,r,r,r, y,r,r,r,r,r")) - (const_int 0)] - "TARGET_SH2E + " f,r,G,H,m,f,FQ,m,r,y,f,>,y,r,y,>,y")) + (use (reg:SI FPSCR_MODES_REG))] + "TARGET_SH2E && sh_lra_p () + && ! sh_movsf_ie_y_split_p (operands[0], operands[1]) && (arith_reg_operand (operands[0], SFmode) || fpul_operand (operands[0], SFmode) || arith_reg_operand (operands[1], SFmode) @@ -6176,7 +6302,6 @@ mov %1,%0 fldi0 %0 fldi1 %0 - # fmov.s %1,%0 fmov.s %1,%0 mov.l %1,%0 @@ -6185,31 +6310,19 @@ fsts fpul,%0 flds %1,fpul lds.l %1,%0 - # sts %1,%0 lds %1,%0 sts.l %1,%0 lds.l %1,%0 ! move optimized away" - "reload_completed - && sh_movsf_ie_ra_split_p (operands[0], operands[1], operands[2])" - [(const_int 0)] -{ - if (! rtx_equal_p (operands[0], operands[1])) - { - emit_insn (gen_movsf_ie (operands[2], operands[1])); - emit_insn (gen_movsf_ie (operands[0], operands[2])); - } -} - [(set_attr "type" "fmove,move,fmove,fmove,pcfload,fload,fstore,pcload,load, - store,fmove,fmove,load,*,fpul_gp,gp_fpul,fstore,load,nil") - (set_attr "late_fp_use" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,yes,*,yes,*,*") + [(set_attr "type" "fmove,move,fmove,fmove,fload,fstore,pcload,load, + store,fmove,fmove,load,fpul_gp,gp_fpul,fstore,load,nil") + (set_attr "late_fp_use" "*,*,*,*,*,yes,*,*,*,*,*,*,yes,*,yes,*,*") (set_attr_alternative "length" [(const_int 2) (const_int 2) (const_int 2) (const_int 2) - (const_int 4) (if_then_else (match_operand 1 "displacement_mem_operand") (const_int 4) (const_int 2)) (if_then_else (match_operand 0 "displacement_mem_operand") @@ -6222,7 +6335,6 @@ (const_int 2) (const_int 2) (const_int 2) - (const_int 4) (const_int 2) (const_int 2) (const_int 2) @@ -6234,7 +6346,6 @@ (const_string "none") (const_string "single") (const_string "single") - (const_string "none") (if_then_else (eq_attr "fmovd" "yes") (const_string "single") (const_string "none")) (if_then_else (eq_attr "fmovd" "yes") @@ -6249,15 +6360,75 @@ (const_string "none") (const_string "none") (const_string "none") + (const_string "none")])]) + +(define_insn_and_split "movsf_ie_rffr" + [(set (match_operand:SF 0 "arith_reg_dest" "=f,r,rf") + (match_operand:SF 1 "arith_reg_operand" "f,r,fr")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (match_scratch:SF 2 "=X,X,y"))] + "TARGET_SH2E && sh_lra_p ()" + "@ + fmov %1,%0 + mov %1,%0 + #" + "reload_completed + && (FP_REGISTER_P (REGNO (operands[0])) + != FP_REGISTER_P (REGNO (operands[1])))" + [(const_int 0)] +{ + emit_insn (gen_movsf_ie_ra (operands[2], operands[1])); + emit_insn (gen_movsf_ie_ra (operands[0], operands[2])); +} + [(set_attr "type" "fmove,move,*") + (set_attr_alternative "length" + [(const_int 2) + (const_int 2) + (const_int 4)]) + (set_attr_alternative "fp_mode" + [(if_then_else (eq_attr "fmovd" "yes") + (const_string "single") (const_string "none")) (const_string "none") (const_string "none")])]) +(define_insn "movsf_ie_F_z" + [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f") + (match_operand:SF 1 "const_double_operand" "F")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set_attr "length" "4")]) + +(define_insn "movsf_ie_Q_z" + [(set (match_operand:SF 0 "fpul_operand" "=y") + (match_operand:SF 1 "pc_relative_load_operand" "Q")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI R0_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "pcfload") + (set_attr "length" "4")]) + +(define_insn "movsf_ie_y" + [(set (match_operand:SF 0 "arith_reg_dest" "=fr") + (match_operand:SF 1 "arith_reg_operand" "rf")) + (use (reg:SI FPSCR_MODES_REG)) + (clobber (reg:SI FPUL_REG))] + "TARGET_SH2E && sh_lra_p ()" + "#" + [(set_attr "type" "*") + (set_attr "length" "4")]) + (define_split [(set (match_operand:SF 0 "register_operand" "") (match_operand:SF 1 "register_operand" "")) (use (reg:SI FPSCR_MODES_REG)) (clobber (reg:SI FPUL_REG))] - "TARGET_SH1" + "TARGET_SH1 + && ! fpul_operand (operands[0], SFmode) + && ! fpul_operand (operands[1], SFmode)" [(parallel [(set (reg:SF FPUL_REG) (match_dup 1)) (use (reg:SI FPSCR_MODES_REG)) (clobber (scratch:SI))]) @@ -6274,11 +6445,37 @@ prepare_move_operands (operands, SFmode); if (TARGET_SH2E) { - if (lra_in_progress) + if (sh_lra_p ()) { if (GET_CODE (operands[0]) == SCRATCH) DONE; - emit_insn (gen_movsf_ie_ra (operands[0], operands[1])); + /* reg from/to multiword subreg may be splitted to several reg from/to + subreg of SImode by subreg1 pass. This confuses our splitted + movsf logic for LRA and will end up in bad code or ICE. Use a special + pattern so that LRA can optimize this case. */ + if (! lra_in_progress && ! reload_completed + && sh_movsf_ie_subreg_multiword_p (operands[0], operands[1])) + { + emit_insn (gen_movsf_ie_rffr (operands[0], operands[1])); + DONE; + } + if (GET_CODE (operands[1]) == CONST_DOUBLE + && ! satisfies_constraint_G (operands[1]) + && ! satisfies_constraint_H (operands[1]) + && REG_P (operands[0])) + emit_insn (gen_movsf_ie_F_z (operands[0], operands[1])); + else if ((REG_P (operands[0]) && REGNO (operands[0]) == FPUL_REG) + && satisfies_constraint_Q (operands[1])) + emit_insn (gen_movsf_ie_Q_z (operands[0], operands[1])); + else if (sh_movsf_ie_y_split_p (operands[0], operands[1])) + { + if (lra_in_progress) + emit_insn (gen_movsf_ie (operands[0], operands[1])); + else + emit_insn (gen_movsf_ie_y (operands[0], operands[1])); + } + else + emit_insn (gen_movsf_ie_ra (operands[0], operands[1])); DONE; } @@ -8970,17 +9167,20 @@ (set_attr "needs_delay_slot" "yes")]) (define_insn "block_lump_real" - [(parallel [(set (mem:BLK (reg:SI R4_REG)) - (mem:BLK (reg:SI R5_REG))) - (use (match_operand:SI 0 "arith_reg_operand" "r,r")) - (use (match_operand 1 "" "Z,Ccl")) - (use (reg:SI R6_REG)) - (clobber (reg:SI PR_REG)) - (clobber (reg:SI T_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) - (clobber (reg:SI R6_REG)) - (clobber (reg:SI R0_REG))])] + [(set (mem:BLK (match_operand:SI 2 "hard_reg_r4" "=r,r")) + (mem:BLK (match_operand:SI 3 "hard_reg_r5" "=r,r"))) + (use (match_operand:SI 0 "arith_reg_operand" "r,r")) + (use (match_operand 1 "" "Z,Ccl")) + (use (match_operand:SI 4 "hard_reg_r6" "=r,r")) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) + (use (reg:SI R6_REG)) + (clobber (match_dup 2)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (clobber (reg:SI PR_REG)) + (clobber (reg:SI T_REG)) + (clobber (reg:SI R0_REG))] "TARGET_SH1 && ! TARGET_HARD_SH4" "@ jsr @%0%# @@ -9005,20 +9205,23 @@ (set_attr "needs_delay_slot" "yes")]) (define_insn "block_lump_real_i4" - [(parallel [(set (mem:BLK (reg:SI R4_REG)) - (mem:BLK (reg:SI R5_REG))) - (use (match_operand:SI 0 "arith_reg_operand" "r,r")) - (use (match_operand 1 "" "Z,Ccl")) - (use (reg:SI R6_REG)) - (clobber (reg:SI PR_REG)) - (clobber (reg:SI T_REG)) - (clobber (reg:SI R4_REG)) - (clobber (reg:SI R5_REG)) - (clobber (reg:SI R6_REG)) - (clobber (reg:SI R0_REG)) - (clobber (reg:SI R1_REG)) - (clobber (reg:SI R2_REG)) - (clobber (reg:SI R3_REG))])] + [(set (mem:BLK (match_operand:SI 2 "hard_reg_r4" "=r,r")) + (mem:BLK (match_operand:SI 3 "hard_reg_r5" "=r,r"))) + (use (match_operand:SI 0 "arith_reg_operand" "r,r")) + (use (match_operand 1 "" "Z,Ccl")) + (use (match_operand:SI 4 "hard_reg_r6" "=r,r")) + (use (reg:SI R4_REG)) + (use (reg:SI R5_REG)) + (use (reg:SI R6_REG)) + (clobber (match_dup 2)) + (clobber (match_dup 3)) + (clobber (match_dup 4)) + (clobber (reg:SI PR_REG)) + (clobber (reg:SI T_REG)) + (clobber (reg:SI R0_REG)) + (clobber (reg:SI R1_REG)) + (clobber (reg:SI R2_REG)) + (clobber (reg:SI R3_REG))] "TARGET_HARD_SH4" "@ jsr @%0%# diff --git a/gcc/config/sh/sh.opt b/gcc/config/sh/sh.opt index 1ef494d4df4..e5ab6c42a0c 100644 --- a/gcc/config/sh/sh.opt +++ b/gcc/config/sh/sh.opt @@ -299,5 +299,5 @@ Target Var(TARGET_FSRRA) Enable the use of the fsrra instruction. mlra -Target Var(sh_lra_flag) Init(0) Save +Target Var(sh_lra_flag) Init(1) Save Use LRA instead of reload (transitional). diff --git a/gcc/config/sh/sync.md b/gcc/config/sh/sync.md index 2eca3accbc8..c89e6e919bb 100644 --- a/gcc/config/sh/sync.md +++ b/gcc/config/sh/sync.md @@ -217,7 +217,9 @@ (and (match_test "mode == SImode") (and (match_test "!TARGET_ATOMIC_HARD_LLCS") (match_test "!TARGET_SH4A || TARGET_ATOMIC_STRICT")) - (match_operand 0 "short_displacement_mem_operand"))))) + (match_operand 0 "short_displacement_mem_operand"))) + (ior (match_test "!TARGET_ATOMIC_HARD_LLCS") + (not (match_operand 0 "gbr_address_mem"))))) (define_expand "atomic_compare_and_swap<mode>" [(match_operand:SI 0 "arith_reg_dest") ;; bool success output @@ -715,7 +717,9 @@ && TARGET_SH4A && !TARGET_ATOMIC_STRICT && mode != SImode")) (ior (match_operand 0 "short_displacement_mem_operand") - (match_operand 0 "gbr_address_mem")))))) + (match_operand 0 "gbr_address_mem")))) + (ior (match_test "!TARGET_ATOMIC_HARD_LLCS") + (not (match_operand 0 "gbr_address_mem"))))) (define_expand "atomic_fetch_<fetchop_name><mode>" [(set (match_operand:QIHISI 0 "arith_reg_dest") diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 394a46fdbaa..f5296fb18b4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3121,9 +3121,20 @@ A target hook which returns @code{true} if @var{subst} can't substitute safely pseudos with equivalent memory values during register allocation. The default version of this target hook returns @code{false}. -On most machines, this default should be used. For generally -machines with non orthogonal register usage for addressing, such -as SH, this hook can be used to avoid excessive spilling. +On most machines, this default should be used. For machines with +non-orthogonal register usage for addressing, such as SH, +this hook can be used to avoid excessive spilling. +@end deftypefn + +@deftypefn {Target Hook} bool TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P (rtx @var{subst}) +A target hook which returns @code{true} if @var{subst} can't +substitute safely pseudos with equivalent constant values during +register allocation. +The default version of this target hook returns @code{false}. +On most machines, this default should be used. For machines with +special constant load instructions that have additional constraints +or being dependent on mode-switching, such as SH, this hook can be +used to avoid unsafe substitution. @end deftypefn @deftypefn {Target Hook} bool TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT (rtx *@var{offset1}, rtx *@var{offset2}, poly_int64 @var{orig_offset}, machine_mode @var{mode}) diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 274bb899d0c..19a47b7062b 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -2426,6 +2426,8 @@ in the reload pass. @hook TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P +@hook TARGET_CANNOT_SUBSTITUTE_CONST_EQUIV_P + @hook TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT @hook TARGET_SPILL_CLASS diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index a25487be299..e1ff662c1d5 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -554,7 +554,11 @@ get_equiv (rtx x) return res; } if ((res = ira_reg_equiv[regno].constant) != NULL_RTX) - return res; + { + if (targetm.cannot_substitute_const_equiv_p (res)) + return x; + return res; + } if ((res = ira_reg_equiv[regno].invariant) != NULL_RTX) return res; gcc_unreachable (); diff --git a/gcc/target.def b/gcc/target.def index 206c94f8749..4411d4f810a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6240,9 +6240,24 @@ DEFHOOK substitute safely pseudos with equivalent memory values during\n\ register allocation.\n\ The default version of this target hook returns @code{false}.\n\ -On most machines, this default should be used. For generally\n\ -machines with non orthogonal register usage for addressing, such\n\ -as SH, this hook can be used to avoid excessive spilling.", +On most machines, this default should be used. For machines with\n\ +non-orthogonal register usage for addressing, such as SH,\n\ +this hook can be used to avoid excessive spilling.", + bool, (rtx subst), + hook_bool_rtx_false) + +/* This target hook allows the backend to avoid unsafe substitution + during register allocation. */ +DEFHOOK +(cannot_substitute_const_equiv_p, + "A target hook which returns @code{true} if @var{subst} can't\n\ +substitute safely pseudos with equivalent constant values during\n\ +register allocation.\n\ +The default version of this target hook returns @code{false}.\n\ +On most machines, this default should be used. For machines with\n\ +special constant load instructions that have additional constraints\n\ +or being dependent on mode-switching, such as SH, this hook can be\n\ +used to avoid unsafe substitution.", bool, (rtx subst), hook_bool_rtx_false) -- 2.51.0

