Richard Biener <rguent...@suse.de> writes: > On Fri, 25 Feb 2022, Jiufu Guo wrote: > >> Richard Biener <rguent...@suse.de> writes: >> >> > On Fri, 25 Feb 2022, Jiufu Guo wrote: >> > >> >> Richard Biener <rguent...@suse.de> writes: >> >> >> >> > On Thu, 24 Feb 2022, Jiufu Guo wrote: >> >> > >> >> >> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >> >> >> >> >> >> > Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >> >> > >> >> >> >> On Wed, Feb 23, 2022 at 02:02:59PM +0100, Richard Biener wrote: >> >> >> >>> I'm assuming we're always dealing with >> >> >> >>> >> >> >> >>> (set (reg:MODE ..) <src_folded>) >> >> >> >>> >> >> >> >>> here and CSE is not substituting into random places of an >> >> >> >>> instruction(?). I don't know what 'rtx_cost' should evaluate >> >> >> >>> to for a constant, if it should implicitely evaluate the cost >> >> >> >>> of putting the result into a register for example. >> >> >> >> >> >> >> >> rtx_cost is no good here (and in most places). rtx_cost should be 0 >> >> >> >> for anything that is used as input in a machine instruction -- but >> >> >> >> you >> >> >> >> need much more context to determine that. insn_cost is much >> >> >> >> simpler and >> >> >> >> much easier to use. >> >> >> >> >> >> >> >>> Using RTX_COST with SET and 1 at least looks no worse than using >> >> >> >>> your proposed new target hook and comparing it with the original >> >> >> >>> unfolded src (again with SET and 1). >> >> >> >> >> >> >> >> It is required to generate valid instructions no matter what, before >> >> >> >> the pass has finished that is. On all more modern architectures it >> >> >> >> is >> >> >> >> futile to think you can usefully consider the cost of an RTL >> >> >> >> expression >> >> >> >> and derive a real-world cost of the generated code from that. >> >> >> > >> >> >> > Thanks Segher for pointing out these! Here is another reason that I >> >> >> > did not use rtx_cost: in a few passes, there are codes to check the >> >> >> > constants and store them in constant pool. I'm thinking to >> >> >> > integerate >> >> >> > those codes in a consistent way. >> >> >> >> >> >> Hi Segher, Richard! >> >> >> >> >> >> I'm thinking the way like: For a constant, >> >> >> 1. if the constant could be used as an immediate for the >> >> >> instruction, then retreated as an operand; >> >> >> 2. otherwise if the constant can not be stored into a >> >> >> constant pool, then handle through instructions; >> >> >> 3. if it is faster to access constant from pool, then emit >> >> >> constant as data(.rodata); >> >> >> 4. otherwise, handle the constant by instructions. >> >> >> >> >> >> And to store the constant into a pool, besides force_const_mem, >> >> >> create reference (toc) may be needed on some platforms. >> >> >> >> >> >> For this particular issue in CSE, there is already code that >> >> >> tries to put constant into a pool (invoke force_const_mem). >> >> >> While the code is too late. So, we may check the constant >> >> >> earlier and store it into constant pool if profitable. >> >> >> >> >> >> And another thing as Segher pointed out, CSE is doing too >> >> >> much work. It may be ok to separate the constant handling >> >> >> logic from CSE. >> >> > >> >> > Not sure - CSE just is value numbering, I don't see that it does >> >> > more than that. Yes, it might have developed "heuristics" over >> >> > the years what to CSE and to what and where to substitute and >> >> > where not. But in the end it does just value numbering. >> >> > >> >> >> >> >> >> I update a new version patch as follow (did not seprate CSE): >> >> > >> >> > How is the new target hook better in any way compared to rtx_cost >> >> > or insn_cost? It looks like a total hack. >> >> > >> >> > I suppose the actual way of materializing a constant is done >> >> > behind GCCs back and not exposed anywhere? But instead you >> >> > claim the constants are valid when they actually are not? >> >> > Isn't the problem then that the rs6000 backend lies? >> >> >> >> Hi Richard, >> >> >> >> Thanks for your comments and sugguestions! >> >> >> >> Materializing a constant should be done behind GCC. >> >> On rs6000, in expand pass, during emit_move, the constant is >> >> checked and store into constant pool if necessary. >> >> Some other platforms are doing a similar thing, e.g. >> >> ix86_expand_vector_move, alpha_expand_mov,... >> >> mips_legitimize_const_move. >> >> >> >> But, it does not as we expect, force_const_mem is also >> >> exposed other places (not only ira/reload for stack reference). >> >> >> >> CSE is one place, for example, CSE first retrieve the constant >> >> from insn's equal, but it also tries to put constant into >> >> pool for some condition (the condition was introduced at >> >> early age of cse.c, and it is rare to run into in trunk). >> >> In some aspects, IMHO, this seems not a great work of CSE. >> >> >> >> And this is how the 'invalid(or say slow)' constant occurs. >> >> e.g. before cse: >> >> 7: r119:DI=[unspec[`*.LC0',%r2:DI] 47] >> >> REG_EQUAL 0x100803004101001 >> >> after cse: >> >> 7: r119:DI=0x100803004101001 >> >> REG_EQUAL 0x100803004101001 >> >> >> >> As you pointed out: we can also avoid this transformation through >> >> rtx_cost/insn_cost by estimating the COST more accurately for >> >> "r119:DI=0x100803004101001". (r119:DI=0x100803004101001 will not >> >> be a real final instruction.) >> > >> > At which point does this become the final instruction? I suppose >> > CSE belives this constant is legitimate and the insn is recognized >> > just fine in CSE? (that's what I mean with "behind GCCs back") >> > >> It would become final instructions during split pass on rs6000, >> other target, e.g. alpha, seem also do split it. >> >> Is it necessary to refine this constant handling for CSE, or even >> >> to eliminate the logic on constant extracting for an insn, beside >> >> updating rtx_cost/insn_cost? >> > >> > So if CSE really just transforms >> > >> >> 7: r119:DI=[unspec[`*.LC0',%r2:DI] 47] >> >> REG_EQUAL 0x100803004101001 >> > >> > to >> > >> >> 7: r119:DI=0x100803004101001 >> >> REG_EQUAL 0x100803004101001 >> > >> > then why can't rtx_cost (SET, 1) or insn_cost () be used to >> > accurately tell it that the immediate is going to be a lot >> > more expensive? >> > >> > That is, rtx_cost (CONST_INT-rtx, DI, SET, 1, ...) is accurate >> > enough to be treated as an actual insn (it _might_ be part of >> > a parallel I guess, but that's unlikely) and thus the backend >> > should have a very good idea of the many instruction it needs >> > for this. Likewise rtx_cost ([unspec[`*.LC0',%r2:DI] 47], DI, SET, 1, >> > ...) >> > should receive accurate cost that can be compared to other >> > rtx_cost (..., DI, SET, 1, ...) >> > >> > And CSE doesn't even need to create fake insns here since IMHO >> > rtx_cost is good enough to capture the full insn. Targets can >> > choose to split out a set_cost from their rtx_cost/insn_cost >> > hooks for this case for example. >> >> Hi Richard, >> >> Right, we can fix this issue by updating rtx_cost/insn_cost to >> tell that this kind of constants is a lot of expansive. >> >> To update rtx_cost, we can use a trivial patch (shown as end of >> this mail) to fix this particular issue. >> To use insn_cost, additional work is replacing rtx_cost with >> insn_cost for CSE, maybe more suitable for stage1. >> >> So, it would be too far to fix this by refactoring the logic of >> constant handling. :-) >> >> Thanks for your comments! >> >> BR, >> Jiufu >> >> > >> > Richard. >> > >> >> Any sugguestions? >> >> >> >> > >> >> > Btw, all of this is of course not appropriate for stage4 and changes >> >> > to CSE need testing on more than one target. >> >> I would do more evaluation, thanks! >> >> >> >> Jiufu >> >> >> >> > >> >> > Richard. >> >> > >> >> >> Thanks for the comments and suggestions again! >> >> >> >> >> >> >> >> >> BR, >> >> >> Jiufu >> >> >> >> >> Part of the experimental patch for rs6000, which could help >> to mitigate inaccurate rtx cost on constant. > > Yes, that sort of thing is exactly what I had in mind. I assume > the corresponding insn_costs hook does the same?
Hi Richard! Thanks for your comments! On most targets, insn_cost also uses code like COSTS_N_INSNS, while insn_cost is also checking insn's attributes (like length,cost, size). Since cse is using rtx_cost, to use insn_cost, we would need to construct an insn first. For this idea, I hacked a patch, which is using insn_cost partially, and could be more aggressively enhanced. BR, Jiufu diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index d7a7cfe860f..b6ca1283b18 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22131,6 +22131,16 @@ rs6000_debug_rtx_costs (rtx x, machine_mode mode, int outer_code, static int rs6000_insn_cost (rtx_insn *insn, bool speed) { + /* Handle special 'invalid' insn. */ + rtx set = PATTERN (insn); + if (GET_CODE (set) == SET && CONSTANT_P (SET_SRC (set))) + { + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (SET_DEST (set)); + if (CONST_INT_P (src) || CONST_WIDE_INT_P (src) || CONST_DOUBLE_P (src)) + return COSTS_N_INSNS (num_insns_constant (src, mode)); + } + if (recog_memoized (insn) < 0) return 0; diff --git a/gcc/cse.cc b/gcc/cse.cc index a18b599d324..53fdb3fff58 100644 --- a/gcc/cse.cc +++ b/gcc/cse.cc @@ -456,6 +456,8 @@ struct table_elt (REG_P (X) ? 0 : notreg_cost (X, MODE, SET, 1)) #define COST_IN(X, MODE, OUTER, OPNO) \ (REG_P (X) ? 0 : notreg_cost (X, MODE, OUTER, OPNO)) +#define COST_SRC(I, X, MODE) \ + (REG_P (X) ? 0 : notreg_cost (X, MODE, SET, 1, I)) /* Get the number of times this register has been updated in this basic block. */ @@ -523,7 +525,8 @@ static bitmap cse_ebb_live_in, cse_ebb_live_out; static sbitmap cse_visited_basic_blocks; static bool fixed_base_plus_p (rtx x); -static int notreg_cost (rtx, machine_mode, enum rtx_code, int); +static int notreg_cost (rtx, machine_mode, enum rtx_code, int, rtx_insn*); +static int insn_cost_x (rtx_insn *, rtx); static int preferable (int, int, int, int); static void new_basic_block (void); static void make_new_qty (unsigned int, machine_mode); @@ -698,7 +701,8 @@ preferable (int cost_a, int regcost_a, int cost_b, int regcost_b) from COST macro to keep it simple. */ static int -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) +notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno, + rtx_insn *insn = NULL) { scalar_int_mode int_mode, inner_mode; return ((GET_CODE (x) == SUBREG @@ -709,9 +713,20 @@ notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) && subreg_lowpart_p (x) && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) ? 0 + : insn != NULL ? insn_cost_x (insn, x) : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2); } +/* Internal function, to get cost when use X to replace source of insn + which is a SET. */ + +static int +insn_cost_x (rtx_insn *insn, rtx x) +{ + SET_SRC (PATTERN (insn)) = x; + return insn_cost (insn, optimize_this_for_speed_p); +} + /* Initialize CSE_REG_INFO_TABLE. */ @@ -4603,6 +4618,7 @@ cse_insn (rtx_insn *insn) Nothing in this loop changes the hash table or the register chains. */ + rtx_insn *tmp_insn = NULL; for (i = 0; i < n_sets; i++) { bool repeat = false; @@ -4638,6 +4654,10 @@ cse_insn (rtx_insn *insn) mode = GET_MODE (src) == VOIDmode ? GET_MODE (dest) : GET_MODE (src); sets[i].mode = mode; + if (tmp_insn == NULL_RTX && src && dest && dest != pc_rtx + && src != pc_rtx) + tmp_insn = gen_move_insn (copy_rtx (dest), src); + if (src_eqv) { machine_mode eqvmode = mode; @@ -5103,7 +5123,7 @@ cse_insn (rtx_insn *insn) src_cost = src_regcost = -1; else { - src_cost = COST (src, mode); + src_cost = COST_SRC (tmp_insn, src, mode); src_regcost = approx_reg_cost (src); } } @@ -5114,7 +5134,7 @@ cse_insn (rtx_insn *insn) src_eqv_cost = src_eqv_regcost = -1; else { - src_eqv_cost = COST (src_eqv_here, mode); + src_eqv_cost = COST_SRC (tmp_insn, src_eqv_here, mode); src_eqv_regcost = approx_reg_cost (src_eqv_here); } } @@ -5125,7 +5145,7 @@ cse_insn (rtx_insn *insn) src_folded_cost = src_folded_regcost = -1; else { - src_folded_cost = COST (src_folded, mode); + src_folded_cost = COST_SRC (tmp_insn, src_folded, mode); src_folded_regcost = approx_reg_cost (src_folded); } } @@ -5136,7 +5156,7 @@ cse_insn (rtx_insn *insn) src_related_cost = src_related_regcost = -1; else { - src_related_cost = COST (src_related, mode); + src_related_cost = COST_SRC (tmp_insn, src_related, mode); src_related_regcost = approx_reg_cost (src_related); /* If a const-anchor is used to synthesize a constant that @@ -5406,7 +5426,7 @@ cse_insn (rtx_insn *insn) src_folded = force_const_mem (mode, trial); if (src_folded) { - src_folded_cost = COST (src_folded, mode); + src_folded_cost = COST_SRC (tmp_insn, src_folded, mode); src_folded_regcost = approx_reg_cost (src_folded); } } @@ -5460,7 +5480,9 @@ cse_insn (rtx_insn *insn) /* If we had a constant that is cheaper than what we are now setting SRC to, use that constant. We ignored it when we thought we could make this into a no-op. */ - if (src_const && COST (src_const, mode) < COST (src, mode) + if (src_const + && COST_SRC (tmp_insn, src_const, mode) + < COST_SRC (tmp_insn, src, mode) && validate_change (insn, &SET_SRC (sets[i].rtl), src_const, 0)) src = src_const; > > Richard. > >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index e82a47f4c0e..e429ae2bcf0 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -21884,6 +21884,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int >> outer_code, >> >> case CONST_DOUBLE: >> case CONST_WIDE_INT: >> + /* Set const to a reg, it may needs a few insns. */ >> + if (outer_code == SET) >> + { >> + *total = COSTS_N_INSNS (num_insns_constant (x, mode)); >> + return true; >> + } >> + /* FALLTHRU */ >> + >> case CONST: >> case HIGH: >> case SYMBOL_REF: