> -----Original Message-----
> From: Richard Biener [mailto:[email protected]]
> Sent: 08 January 2014 14:42
> To: Paulo Matos
> Cc: Andrew Haley; [email protected]; Jan Hubicka
> Subject: Re: Infinite number of iterations in loop [v850, mep]
>
> On Wed, Jan 8, 2014 at 3:09 PM, Paulo Matos <[email protected]> wrote:
> >> -----Original Message-----
> >> From: Richard Biener [mailto:[email protected]]
> >> Sent: 08 January 2014 11:03
> >> To: Paulo Matos
> >> Cc: Andrew Haley; [email protected]
> >> Subject: Re: Infinite number of iterations in loop [v850, mep]
> >>
> >> That was refering to the case with extern b. For the above case the
> >> issue must be sth else. Trying a cross to v850-elf to see if it
> >> reproduces for me (if 'b' is a stack or argument slot then we might
> >> bogously think that *c++ = 0 may clobber it, otherwise RTL
> >> number of iteration analysis might just be confused).
> >>
> >> So for example (should be arch independent)
> >>
> >> struct X { int i; int j; int k; int l[24]; };
> >>
> >> int foo (struct X x, int *p)
> >> {
> >> int z = x.j;
> >> *p = 1;
> >> return z;
> >> }
> >>
> >> see if there is a anti-dependence between x.j and *p on the RTL side
> >> (at least the code dispatching to the tree oracle using the MEM_EXPRs
> >> should save you from that).
> >>
> >> So - v850 at least doesn't pass b in memory and the doloop recognition
> >> works for me (on trunk).
> >>
> >
> > You are right, everything is fine with the above example regarding the anti-
> dependence and with the loop as well. I got confused with mine not generating
> a
> loop for
> > void fn1 (unsigned int b)
> > {
> > unsigned int a;
> > for (a = 0; a < b; a++)
> > *c++ = 0;
> > }
> >
> > but that simply because in our case it is not profitable.
> >
> > However, for the case:
> > void matrix_add_const(unsigned int N, short *A, short val) {
> > unsigned int i,j;
> > for (i=0; i<N; i++) {
> > for (j=0; j<N; j++) {
> > A[i*N+j] += val;
> > }
> > }
> > }
> >
> > GCC thinks for v850 and my port that the inner loop might be infinite.
> > It looks like GCC is mangling the loop so much that the obviousness that the
> inner loop is finite is lost.
> >
> > This however turns out to be very performance degrading. Using -fno-ivopts
> makes generation of loops work again both in my port and v850.
> > Is there a way to fine-tune ivopts besides trying to tune the costs or do
> > you
> reckon this is something iv-analysis should be smarter about?
>
> Well. We have
>
> Loop 2 is simple:
> simple exit 5 -> 7
> infinite if: (expr_list:REG_DEP_TRUE (and:SI (reg:SI 76)
> (const_int 1 [0x1]))
> (nil))
> number of iterations: (lshiftrt:SI (plus:SI (minus:SI (reg:SI 68 [ D.1398 ])
> (reg:SI 64 [ ivtmp___6 ]))
> (const_int -2 [0xfffffffffffffffe]))
> (const_int 1 [0x1]))
> upper bound: 2147483646
> realistic bound: -1
> Doloop: Possible infinite iteration case.
> Doloop: The loop is not suitable.
>
> as we replaced the induction variable by a pointer induction with
> step 2. So this might be a very common issue for RTL loop opts,
> the upper bound of the IV is 2 * N in this case, so 2 * N & 1
> should be always false and thus "infinite" be optimized.
>
> (insn 34 33 36 3 (parallel [
> (set (reg:SI 76)
> (plus:SI (reg/v:SI 71 [ N ])
> (reg/v:SI 71 [ N ])))
> (clobber (reg:CC 32 psw))
> ]) 21 {addsi3}
> (expr_list:REG_UNUSED (reg:CC 32 psw)
> (nil)))
>
> that doesn't look too difficult to do with the above definition.
> nonzero_bits might be of use here, not sure (not my area of
> expertise).
>
I would like some comments on the following patch that seems to work but I
think it could be generalized.
The idea is for the specific infinite condition of type (and reg int), we can
search for the definition of reg,
check nonzero_bits and check that they don't match any of the bits in int.
diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index 4c34007..215fd22 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -2064,6 +2064,50 @@ simplify_using_initial_values (struct loop *loop, enum
rtx_code op, rtx *expr)
e = single_pred_edge (e->src);
}
+ /* For certain patterns we can do even better, like (and (reg) 1). */
+ if (GET_CODE (*expr) == AND
+ && REG_P (XEXP (*expr, 0))
+ && CONST_INT_P (XEXP (*expr, 1)))
+ {
+ rtx reg = XEXP (*expr, 0);
+ unsigned HOST_WIDE_INT mask = INTVAL (XEXP (*expr, 1));
+ rtx insn_def = NULL_RTX;
+ basic_block bb = loop_preheader_edge (loop)->src;
+
+ while (1)
+ {
+ rtx insn;
+
+ if (bb == ENTRY_BLOCK_PTR)
+ break;
+
+ FOR_BB_INSNS_REVERSE (bb, insn)
+ {
+ if (!INSN_P (insn))
+ break;
+ if (df_reg_defined (insn, reg))
+ {
+ insn_def = insn;
+ break;
+ }
+ }
+
+ if (insn_def)
+ break;
+ bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+ }
+
+ if (insn_def)
+ {
+ rtx set = single_set (insn_def);
+ unsigned HOST_WIDE_INT nonzero = nonzero_bits (SET_SRC (set),
GET_MODE (reg));
+
+ /* Assumption will never occur and can be optimized. */
+ if ((mask & nonzero) == 0)
+ *expr = const0_rtx;
+ }
+ }
+
out:
free_EXPR_LIST_list (&cond_list);
if (!CONSTANT_P (*expr))
> Richard.
>
> > Paulo Matos
> >
> >> Richard.
> >>
> >> > The same situation occurs with a coremark function:
> >> > void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) {
> >> > ee_u32 i,j;
> >> > for (i=0; i<N; i++) {
> >> > for (j=0; j<N; j++) {
> >> > A[i*N+j] += val;
> >> > }
> >> > }
> >> > }
> >> >
> >> > It also says the inner loop might be infinite but it can't N is given as
> >> argument. j starts as 0, N is unsigned like N. This will loop N times. GCC
> cannot
> >> possibly assume array A will overwrite the value of N in the loop. This
> >> seems
> >> like something is wrong in alias analysis.
> >> >
> >> >> --
> >> >> PMatos