Richard Biener <rguent...@suse.de> writes: > So I've come back to PR66313 and found a solution to the tailrecursion > missed optimization when fixing the factoring folding to use an unsigned > type when we're not sure of overflow. > > The folding part is identical to my last try from 2015, the tailrecursion > part makes us handle intermittent stmts that were introduced by foldings > that "clobber" our quest walking the single-use chain of stmts between > the call and the return (and failing at all stmts that are not part > of said chain). A simple solution is to move the stmts that are not > part of the chain and that we can move before the call. That handles > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > Richard. > > 2017-05-31 Richard Biener <rguent...@suse.de> > > PR middle-end/66313 > * fold-const.c (fold_plusminus_mult_expr): If the factored > factor may be zero use a wrapping type for the inner operation. > * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap > and handle moved defs. > (process_assignment): Properly guard the unary op case. Return a > tri-state indicating that moving the stmt before the call may allow > to continue. Pass through to_move. > (find_tail_calls): Handle moving unrelated defs before > the call. > > * c-c++-common/ubsan/pr66313.c: New testcase. > * gcc.dg/tree-ssa/loop-15.c: Adjust. > > Index: gcc/fold-const.c > =================================================================== > *** gcc/fold-const.c.orig 2015-10-29 12:32:33.302782318 +0100 > --- gcc/fold-const.c 2015-10-29 14:08:39.936497739 +0100 > *************** fold_plusminus_mult_expr (location_t loc > *** 6916,6925 **** > } > same = NULL_TREE; > > ! if (operand_equal_p (arg01, arg11, 0)) > ! same = arg01, alt0 = arg00, alt1 = arg10; > ! else if (operand_equal_p (arg00, arg10, 0)) > same = arg00, alt0 = arg01, alt1 = arg11; > else if (operand_equal_p (arg00, arg11, 0)) > same = arg00, alt0 = arg01, alt1 = arg10; > else if (operand_equal_p (arg01, arg10, 0)) > --- 6916,6926 ---- > } > same = NULL_TREE; > > ! /* Prefer factoring a common non-constant. */ > ! if (operand_equal_p (arg00, arg10, 0)) > same = arg00, alt0 = arg01, alt1 = arg11; > + else if (operand_equal_p (arg01, arg11, 0)) > + same = arg01, alt0 = arg00, alt1 = arg10; > else if (operand_equal_p (arg00, arg11, 0)) > same = arg00, alt0 = arg01, alt1 = arg10; > else if (operand_equal_p (arg01, arg10, 0)) > *************** fold_plusminus_mult_expr (location_t loc > *** 6974,6987 **** > } > } > > ! if (same) > return fold_build2_loc (loc, MULT_EXPR, type, > fold_build2_loc (loc, code, type, > fold_convert_loc (loc, type, alt0), > fold_convert_loc (loc, type, alt1)), > fold_convert_loc (loc, type, same)); > > ! return NULL_TREE; > } > > /* Subroutine of native_encode_expr. Encode the INTEGER_CST > --- 6975,7010 ---- > } > } > > ! if (!same) > ! return NULL_TREE; > ! > ! if (! INTEGRAL_TYPE_P (type) > ! || TYPE_OVERFLOW_WRAPS (type) > ! /* We are neither factoring zero nor minus one. */ > ! || TREE_CODE (same) == INTEGER_CST) > return fold_build2_loc (loc, MULT_EXPR, type, > fold_build2_loc (loc, code, type, > fold_convert_loc (loc, type, alt0), > fold_convert_loc (loc, type, alt1)), > fold_convert_loc (loc, type, same)); > > ! /* Same may be zero and thus the operation 'code' may overflow. Likewise > ! same may be minus one and thus the multiplication may overflow. > Perform > ! the operations in an unsigned type. */ > ! tree utype = unsigned_type_for (type); > ! tree tem = fold_build2_loc (loc, code, utype, > ! fold_convert_loc (loc, utype, alt0), > ! fold_convert_loc (loc, utype, alt1)); > ! /* If the sum evaluated to a constant that is not -INF the multiplication > ! cannot overflow. */ > ! if (TREE_CODE (tem) == INTEGER_CST > ! && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED))) > ! return fold_build2_loc (loc, MULT_EXPR, type, > ! fold_convert (type, tem), same); > ! > ! return fold_convert_loc (loc, type, > ! fold_build2_loc (loc, MULT_EXPR, utype, tem, > ! fold_convert_loc (loc, utype, > same))); > } > > /* Subroutine of native_encode_expr. Encode the INTEGER_CST
Sorry if you already know, but this part means that we can no longer vectorise: int f (int *x, int b1, int b2, int b3) { int foo = 0; for (int i1 = 0; i1 < b1; ++i1) for (int i2 = 0; i2 < b2; ++i2) for (int i3 = 0; i3 < b3; ++i3) foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)]; return foo; } We now convert all the arithmetic in the [...] to unsigned int and reassociate it so that the "- 1" is applied last. We then assume that the overflow in this subtraction is well-defined and that the &x[...] might not be linear for the inner loop. Thanks, Richard