Hi, On 2020/4/18 00:32, Segher Boessenkool wrote: > On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: >> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: >>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>>> - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); >>>> + { >>>> + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on >>>> addop0 >>>> + is never zero, as gimple pass loop ch will do optimization to simplify >>>> + the loop to NO loop for loop condition is false. */ >>> >>> IMO the code needs to prove this, rather than just assume that previous >>> passes have made it so. >> >> Well, it should gcc_assert it, probably. >> >> It is the left-hand side of a+b... it cannot be 0, because niter always >> is simplified! > > Scratch that... it cannot be *constant* 0, but that isn't the issue here.
Sorry that my comments in the code is a bit misleading, it is actually not related to loop-ch at all. The instruction sequence at 255r.loop2_invariant: 25: NOTE_INSN_BASIC_BLOCK 5 26: r133:SI=r123:DI#0-0x1 REG_DEAD r123:DI 27: r123:DI=zero_extend(r133:SI) REG_DEAD r133:SI 28: r124:DI=r124:DI+0x4 30: r134:CC=cmp(r123:DI,0) 31: pc={(r134:CC!=0)?L69:pc} And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71): ;; Determined upper bound -1. Loop 2 is simple: simple exit 6 -> 7 number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0) (const_int -1 [0xffffffffffffffff])) upper bound: 2147483646 likely upper bound: 2147483646 realistic bound: -1 ... 72: r144:SI=r123:DI#0-0x1 73: r143:DI=zero_extend(r144:SI) 74: r142:DI=r143:DI+0x1 ... 25: NOTE_INSN_BASIC_BLOCK 5 26: r133:SI=r123:DI#0-0x1 REG_DEAD r123:DI 27: r123:DI=zero_extend(r133:SI) REG_DEAD r133:SI 28: r124:DI=r124:DI+0x4 30: r134:CC=cmp(r123:DI,0) 71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;} increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the loop upper bound is 2147483646(0x7fffffffe)??? Since this simplification is in doloop-modify, there is already some doloop form check like !desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems not necessary to repeat check it here again? Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if instruction #26 overflow? Updated patch, thanks: This "subtract/extend/add" existed for a long time and still annoying us (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). Andraw: http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html Joseph: https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html However, the doloop code improved a lot since so many years passed, gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop and no transform done in doloop again. Thus we can do the simplification from "subtract/extend/add" to only extend when loop upper_bound is known to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow). Bootstrap and regression tested pass on Power8-LE. gcc/ChangeLog 2020-04-20 Xiong Hu Luo <luo...@linux.ibm.com> PR rtl-optimization/37451, PR target/61837 * loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) to zero_ext. --- gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c index db6a014e43d..da537aff60f 100644 --- a/gcc/loop-doloop.c +++ b/gcc/loop-doloop.c @@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc, } if (increment_count) - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + { + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e: + + 73: r145:SI=r123:DI#0-0x1 + 74: r144:DI=zero_extend(r145:SI) + 75: r143:DI=r144:DI+0x1 + ... + 31: r135:CC=cmp(r123:DI,0) + 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber + scratch;clobber scratch;} + + r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop + count be saved to ctr, if this loop's upper bound is known, r123:DI#0 + won't be zero, then the expressions could be simplified to zero_extend + only. */ + bool simplify_zext = false; + rtx extop0 = XEXP (count, 0); + if (loop->any_upper_bound + && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe) + && GET_CODE (count) == ZERO_EXTEND + && GET_CODE (extop0) == PLUS) + { + rtx addop0 = XEXP (extop0, 0); + rtx addop1 = XEXP (extop0, 1); + + unsigned int_mode + = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0)); + if (CONST_SCALAR_INT_P (addop1) + && GET_MODE_PRECISION (mode) == int_mode * 2 + && addop1 == GEN_INT (-1)) + { + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, + GET_MODE (addop0)); + simplify_zext = true; + } + } + + if (!simplify_zext) + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + } /* Insert initialization of the count register into the loop header. */ start_sequence (); -- 2.21.0.777.g83232e3864