Hi,

On 2020/4/18 00:32, Segher Boessenkool wrote:
> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
>>>> +    {
>>>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on 
>>>> addop0
>>>> +   is never zero, as gimple pass loop ch will do optimization to simplify
>>>> +   the loop to NO loop for loop condition is false.  */
>>>
>>> IMO the code needs to prove this, rather than just assume that previous
>>> passes have made it so.
>>
>> Well, it should gcc_assert it, probably.
>>
>> It is the left-hand side of a+b...  it cannot be 0, because niter always
>> is simplified!
> 
> Scratch that...  it cannot be *constant* 0, but that isn't the issue here.

Sorry that my comments in the code is a bit misleading, it is actually not
related to loop-ch at all.  The instruction sequence at 255r.loop2_invariant:

   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   31: pc={(r134:CC!=0)?L69:pc}

And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71):   

;; Determined upper bound -1.
Loop 2 is simple:
  simple exit 6 -> 7
  number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0)
    (const_int -1 [0xffffffffffffffff]))
  upper bound: 2147483646
  likely upper bound: 2147483646
  realistic bound: -1
...
   72: r144:SI=r123:DI#0-0x1
   73: r143:DI=zero_extend(r144:SI)
   74: r142:DI=r143:DI+0x1
...
   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber 
scratch;}

increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is 
the loop number
of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be 
zero as the
loop upper bound is 2147483646(0x7fffffffe)???

Since this simplification is in doloop-modify,  there is already some doloop 
form check like
!desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it 
seems
not necessary to repeat check it here again? 
Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid 
if
instruction #26 overflow?


Updated patch, thanks:


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

However, the doloop code improved a lot since so many years passed,
gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
and no transform done in doloop again.  Thus we can do the simplification
from "subtract/extend/add" to only extend when loop upper_bound is known
to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

        2020-04-20  Xiong Hu Luo  <luo...@linux.ibm.com>

        PR rtl-optimization/37451, PR target/61837
        * loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
        to zero_ext.
---
 gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..da537aff60f 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
+
+        73: r145:SI=r123:DI#0-0x1
+        74: r144:DI=zero_extend(r145:SI)
+        75: r143:DI=r144:DI+0x1
+        ...
+        31: r135:CC=cmp(r123:DI,0)
+        72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
+        scratch;clobber scratch;}
+
+        r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop
+        count be saved to ctr, if this loop's upper bound is known, r123:DI#0
+        won't be zero, then the expressions could be simplified to zero_extend
+        only.  */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (loop->any_upper_bound
+         && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe)
+         && GET_CODE (count) == ZERO_EXTEND
+         && GET_CODE (extop0) == PLUS)
+       {
+         rtx addop0 = XEXP (extop0, 0);
+         rtx addop1 = XEXP (extop0, 1);
+
+         unsigned int_mode
+           = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
+         if (CONST_SCALAR_INT_P (addop1)
+             && GET_MODE_PRECISION (mode) == int_mode * 2
+             && addop1 == GEN_INT (-1))
+           {
+             count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+                                         GET_MODE (addop0));
+             simplify_zext = true;
+           }
+       }
+
+      if (!simplify_zext)
+       count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
-- 
2.21.0.777.g83232e3864


Reply via email to