Re: [PATCH] fold_range_test like optimization on GIMPLE (PR tree-optimization/46309)

Richard Guenther Fri, 30 Sep 2011 05:27:06 -0700

On Fri, Sep 30, 2011 at 1:11 PM, Jakub Jelinek <[email protected]> wrote:
> On Fri, Sep 30, 2011 at 12:33:07PM +0200, Richard Guenther wrote:
>> > +  low = build_int_cst (TREE_TYPE (exp), 0);
>> > +  high = low;
>> > +  in_p = 0;
>> > +  strict_overflow_p = false;
>> > +  is_bool = TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE;
>>
>> Effective boolean are also TYPE_PRECISION () == 1 types.  Remember
>> we don't preserve conversions from BOOLEAN_TYPE to such types.
>
> I can replace these with TYPE_PRECISION (TREE_TYPE (exp)) == 1;
> checks if you prefer, though maybe it would need to also do
> && TYPE_UNSIGNED (TREE_TYPE (exp)), at least if different operands of
> the | have different inner 1-bit signedness we could not merge them.


The canonical test for boolean-kind types is now

  TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE
  || TYPE_PRECISION (TREE_TYPE (exp)) == 1

Ada for example has non-1-precision BOOLEAN_TYPEs.  But, see
very below.

>> > +       CASE_CONVERT:
>> > +         is_bool |= TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE;
>>
>> Likewise.  Though I wonder why !=?  Does it matter whether the extension
>> sign- or zero-extends?
>
> I think the |= is needed, this loop follows through not just casts, but
> then also through the comparisons and again through casts.
> It doesn't matter if the types or casts inside of the comparison argument
> are bool or not (and likely they will not be), yet we don't want to stop
> iterating because of that.
> It wants to reconstruct ranges from what e.g. fold in fold_range_test
> already created before, so stuff like
> (int) (((unsigned) x - 64U) <= 31U)
> for + [64, 95] range.
>
>> > +                 if (integer_onep (range_binop (LT_EXPR, 
>> > integer_type_node,
>> > +                                                p->low, 0, q->low, 0)))
>>
>> that's just
>>
>> tem = fold_binary (LT_EXPR, boolean_type_node, p->low, q->low);
>> if (tem && integer_onep (tem))
>>   return -1;
>
> Ok, can change that.
>
>> which avoids building the LT_EXPR tree if it doesn't fold.  Similar below.
>> (ISTR some integer_onep () variant that handles a NULL argument ...)
>
> Couldn't find any.  integer_onep needs non-NULL, compare_tree_int too.
>
>> > +  /* Try to merge ranges.  */
>> > +  for (first = i; i < length; i++)
>> > +    {
>> > +      tree low = ranges[i].low;
>> > +      tree high = ranges[i].high;
>> > +      int in_p = ranges[i].in_p;
>> > +      bool strict_overflow_p = ranges[i].strict_overflow_p;
>> > +
>> > +      for (j = i + 1; j < length; j++)
>> > +       {
>>
>> That looks quadratic - do we want to limit this with a --param, simply
>> partitioning the array into quadratic chunks?
>
> This isn't quadratic (except in the hypothetical case
> where all merge_ranges calls would succeed, but then
> build_range_test would fail).  In the likely case where if range merges
> succeed then update_range_test succeeds too this is just linear (plus the
> qsort before that which isn't linear though).
> So perhaps:
> +      if (j > i + 1
> +         && update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode,
> +                               ops, ranges[i].exp, in_p, low, high,
> +                               strict_overflow_p))
> +       {
> +         i = j - 1;
> +         any_changes = true;
> +       }
> could be
> +      if (j > i + 1)
> +       {
> +         if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, 
> opcode,
> +                               ops, ranges[i].exp, in_p, low, high,
> +                               strict_overflow_p))
> +           any_changes = true;
> +         i = j - 1;
> +       }
> (then it isn't quadratic), or could try a few times before giving up:
> +      if (j > i + 1)
> +       {
> +         if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, 
> opcode,
> +                               ops, ranges[i].exp, in_p, low, high,
> +                               strict_overflow_p))
> +           {
> +             any_changes = true;
> +             i = j - 1;
> +           }
> +         else if (update_fail_count == 64)
> +           i = j - 1;
> +         else
> +           update_fail_count = 0;
> +       }
> where int update_fail_count = 0; would be after the
>      bool strict_overflow_p = ranges[i].strict_overflow_p;
> line in the outer loop.
>
>> > +         if (ranges[i].exp != ranges[j].exp)
>> > +           break;
>>
>> Or isn't it too bad because of this check?
>
> The above limits the chunks to a particular SSA_NAME.  Within each
> chunk for the same SSA_NAME, the ranges are sorted in a way that
> merge_ranges will likely succeed, so it isn't quadratic unless
> update_range_test fails (see above).
>
> BTW, the second loop (the one that attempts to optimize
> x == 1 || x == 3 into (x & ~2) == 1 etc. is quadratic, which is why
> there is
>  for (j = i + 1; j < length && j < i + 64; j++)
> don't think it is a limit people will often run into and thus I don't
> think it is worth adding a --param= for that.

Ok.

>> > +         if (!merge_ranges (&in_p, &low, &high, in_p, low, high,
>> > +                            ranges[j].in_p, ranges[j].low, 
>> > ranges[j].high))
>> > +           break;
>>
>> And this early out?  I suppose some comment on why together with
>> the sorting this is of limited complexity would help.
>>
>> > @@ -2447,6 +2864,9 @@ reassociate_bb (basic_block bb)
>> >                  optimize_ops_list (rhs_code, &ops);
>> >                }
>> >
>> > +             if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR)
>>
>> I think we want to check that the type is boolean here, so we don't setup
>> the machinery needlessly?
>
> It is boolean only in some testcases, the is_bool stuff discussed at the
> beginning above was originally just an early return
>  if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
>    return;
> before the loop, but it turned out that often the type of the | operands
> is integer, with either bool casted to integer, or with the type of EQ_EXPR
> etc. being integer instead of bool.

Really?  The type of EQ_EXPR should be always either BOOLEAN_TYPE
or INTEGRAL_TYPE_P with TYPE_PRECISION == 1.  That's what
the gimple verifier checks.  Or do you mean that fold introduces these
kind of types during range-test simplification?

Richard.

>
>        Jakub
>

Re: [PATCH] fold_range_test like optimization on GIMPLE (PR tree-optimization/46309)

Reply via email to