Hi!
Richard Biener <rguent...@suse.de> writes: > On Wed, 9 Mar 2022, Jiufu Guo wrote: > >> >> Hi! >> >> Richard Biener <rguent...@suse.de> writes: >> >> > On Tue, 8 Mar 2022, Jiufu Guo wrote: >> > >> >> Jiufu Guo <guoji...@linux.ibm.com> writes: >> >> >> >> Hi! >> >> >> >> > Hi Sehger, >> >> > >> >> > Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >> > >> >> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote: >> >> >>> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >> >>> > No. insn_cost is only for correct, existing instructions, not for >> >> >>> > made-up nonsense. I created insn_cost precisely to get away from >> >> >>> > that >> >> >>> > aspect of rtx_cost (and some other issues, like, it is incredibly >> >> >>> > hard >> >> >>> > and cumbersome to write a correct rtx_cost). >> >> >>> >> >> >>> Thanks! The implementations of hook insn_cost are align with this >> >> >>> design, they are checking insn's attributes and COSTS_N_INSNS. >> >> >>> >> >> >>> One question on the speciall case: >> >> >>> For instruction: "r119:DI=0x100803004101001" >> >> >>> Would we treat it as valid instruction? >> >> >> >> >> >> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n. >> >> >> This is costed as 5 insns (cost=20). >> >> >> >> >> >> It generally is better to split things into patterns close to the >> >> >> eventual machine isntructions as early as possible: all the more >> >> >> generic >> >> >> optimisations can take advantage of that then. >> >> > Get it! >> >> >> >> >> >>> A patch, which is attached the end of this mail, accepts >> >> >>> "r119:DI=0x100803004101001" as input of insn_cost. >> >> >>> In this patch, >> >> >>> - A tmp instruction is generated via make_insn_raw. >> >> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost. >> >> >>> - In hook of insn_cost, checking the special 'constant' instruction. >> >> >>> Are these make sense? >> >> >> >> >> >> I'll review that patch inline. >> >> >> >> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc. >> >> Different from the previous partial patch, this patch replaces all usage >> >> of rtx_cost. It may be better/aggressive than previous one. >> > >> > I think there's no advantage for using insn_cost over rtx_cost for >> > the simple SET case. >> >> Thanks for your comments and raise this concern. >> >> For those targets which do not implement insn_cost, insn_cost calls >> rtx_cost through pattern_cost, then insn_cost is equal to rtx_cost. >> >> While, for those targets which have insn_cost, it seems insn_cost would >> be better(or say more accurate/consistent?) than rtx_cost. Since: >> - insn_cost recog the insn first, and compute cost through something > > target hooks are expected to call recog on the insn but the generic > fallback does not!? Or do you say a target _could_ call recog? > I think it would be valid to only expect recognized insns here > and thus cse.cc obligation would be to call regoc on the "fake" > instruction which then raises the obvious issue whether you should > ever call recog on something "fake". Thanks Richard! I also feel this is a main point of insn_cost. >From my understanding: it would be better to let insn_cost check the valid recognizable insns; this would be the major purpose of insn_cost. While, I'm also wondering, we may let it go for 'fake' instruction for current implementations (e.g. arm_insn_cost) which behaviors to walk/check pattern. > > I also see that rs6000_insn_cost does > > static int > rs6000_insn_cost (rtx_insn *insn, bool speed) > { > if (recog_memoized (insn) < 0) > return 0; > > so not recognized insns become quite cheap - it looks like > insn_cost has no documented failure mode and initial implementors > chickened out asserting that an insn is / can be recognized. > In fact I'd assert that INSN_CODE (insn) >= 0 simply because there's > no failure mode and the _caller_ should have made sure the > insn can be recognized. That said, the insn_cost should do that. Thanks! Using the assert would help to catch un-recog failures. > (is INSN_CODE () == 0 a real thing?) 0 is specialy, it is some thing like CODE_FOR_nothing. :-) > >> (like length/cost attributes from .md file) for the 'machine insn'. >> - rtx_cost estimates the cost through analyzing the 'rtx content'. >> The accurate estimation relates to the context. >> >> For a special example: "%r100 = C", as a previous patch, by tunning >> target's rtx_cost hook, cost could be computed according to the value >> of C. insn_cost may just model the cost in the define of the machine >> instruction. >> >> These reasons are my initial thoughts. Segher may have better >> explain. :-) > > OK, so in theory the targets insn_cost can use insn attributes > which allows to keep the cost parts of an instruction close to the > instruction patterns which is probably a good thing for > maintainability. But as you point out this requires recognizing > of possibly generated instructions. And before reload it has > the issue that the alternative is not determined which means the > true cost cannot be determined without walking the pattern, > like on x86 where many instruction operands have both register > and memory alternatives. Right, when there is no alternative fits for true cost, it may have to check the pattern like 'rtx_cost' then. :-) BR, Jiufu > >> To replace rtx_cost with insn_cost, this patch build a SET instruction: >> "%r = rtx_expr", then using "%r = rtx_expr" from insn_cost to simulate >> the cost of "rtx_expr" from rtx_cost. >> >> >> BR, >> Jiufu >> >> > >> > Richard. >> > >> >> With this patch, bootstrap pass. >> >> From regtest, only output of fusion-p10-ldcmpi.c is changed, and the >> >> change seems as expected. >> >> >> >> >> >> BR, >> >> Jiufu >> >> >> >> diff --git a/gcc/cse.cc b/gcc/cse.cc >> >> index a18b599d324..e623ad298db 100644 >> >> --- a/gcc/cse.cc >> >> +++ b/gcc/cse.cc >> >> @@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table; >> >> static rtx_insn *this_insn; >> >> static bool optimize_this_for_speed_p; >> >> >> >> +/* Used for insn_cost. */ >> >> +static rtx_insn *estimate_insn; >> >> + >> >> /* Index by register number, gives the number of the next (or >> >> previous) register in the chain of registers sharing the same >> >> value. >> >> @@ -445,7 +448,7 @@ struct table_elt >> >> /* Compute cost of X, as stored in the `cost' field of a table_elt. >> >> Fixed >> >> hard registers and pointers into the frame are the cheapest with a >> >> cost >> >> of 0. Next come pseudos with a cost of one and other hard registers >> >> with >> >> - a cost of 2. Aside from these special cases, call `rtx_cost'. */ >> >> + a cost of 2. Aside from these special cases, call `insn_cost'. */ >> >> >> >> #define CHEAP_REGNO(N) >> >> \ >> >> (REGNO_PTR_FRAME_P (N) \ >> >> @@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, >> >> int regcost_b) >> >> from COST macro to keep it simple. */ >> >> >> >> static int >> >> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno) >> >> +notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int >> >> /*opno*/) >> >> { >> >> scalar_int_mode int_mode, inner_mode; >> >> - return ((GET_CODE (x) == SUBREG >> >> - && REG_P (SUBREG_REG (x)) >> >> - && is_int_mode (mode, &int_mode) >> >> - && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) >> >> - && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) >> >> - && subreg_lowpart_p (x) >> >> - && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) >> >> - ? 0 >> >> - : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2); >> >> + if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x)) >> >> + && is_int_mode (mode, &int_mode) >> >> + && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode) >> >> + && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode) >> >> + && subreg_lowpart_p (x) >> >> + && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode)) >> >> + return 0; >> >> + >> >> + if (estimate_insn == NULL) >> >> + { >> >> + estimate_insn = make_insn_raw ( >> >> + gen_rtx_SET (gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1), x)); >> >> + SET_PREV_INSN (estimate_insn) = NULL_RTX; >> >> + SET_NEXT_INSN (estimate_insn) = NULL_RTX; >> >> + INSN_LOCATION (estimate_insn) = 0; >> >> + } >> >> + else >> >> + { >> >> + /* Update for new context. */ >> >> + INSN_CODE (estimate_insn) = -1; >> >> + PUT_MODE (SET_DEST (PATTERN (estimate_insn)), mode); >> >> + SET_SRC (PATTERN (estimate_insn)) = x; >> >> + } >> >> + return insn_cost (estimate_insn, optimize_this_for_speed_p); >> >> } >> >> >> >> >> >> @@ -6667,6 +6685,7 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs) >> >> >> >> init_recog (); >> >> init_alias_analysis (); >> >> + estimate_insn = NULL; >> >> >> >> reg_eqv_table = XNEWVEC (struct reg_eqv_elem, nregs); >> >> >> >> >>