On Fri, Sep 30, 2011 at 1:11 PM, Jakub Jelinek <[email protected]> wrote:
> On Fri, Sep 30, 2011 at 12:33:07PM +0200, Richard Guenther wrote:
>> > + low = build_int_cst (TREE_TYPE (exp), 0);
>> > + high = low;
>> > + in_p = 0;
>> > + strict_overflow_p = false;
>> > + is_bool = TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE;
>>
>> Effective boolean are also TYPE_PRECISION () == 1 types. Remember
>> we don't preserve conversions from BOOLEAN_TYPE to such types.
>
> I can replace these with TYPE_PRECISION (TREE_TYPE (exp)) == 1;
> checks if you prefer, though maybe it would need to also do
> && TYPE_UNSIGNED (TREE_TYPE (exp)), at least if different operands of
> the | have different inner 1-bit signedness we could not merge them.
The canonical test for boolean-kind types is now
TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE
|| TYPE_PRECISION (TREE_TYPE (exp)) == 1
Ada for example has non-1-precision BOOLEAN_TYPEs. But, see
very below.
>> > + CASE_CONVERT:
>> > + is_bool |= TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE;
>>
>> Likewise. Though I wonder why !=? Does it matter whether the extension
>> sign- or zero-extends?
>
> I think the |= is needed, this loop follows through not just casts, but
> then also through the comparisons and again through casts.
> It doesn't matter if the types or casts inside of the comparison argument
> are bool or not (and likely they will not be), yet we don't want to stop
> iterating because of that.
> It wants to reconstruct ranges from what e.g. fold in fold_range_test
> already created before, so stuff like
> (int) (((unsigned) x - 64U) <= 31U)
> for + [64, 95] range.
>
>> > + if (integer_onep (range_binop (LT_EXPR,
>> > integer_type_node,
>> > + p->low, 0, q->low, 0)))
>>
>> that's just
>>
>> tem = fold_binary (LT_EXPR, boolean_type_node, p->low, q->low);
>> if (tem && integer_onep (tem))
>> return -1;
>
> Ok, can change that.
>
>> which avoids building the LT_EXPR tree if it doesn't fold. Similar below.
>> (ISTR some integer_onep () variant that handles a NULL argument ...)
>
> Couldn't find any. integer_onep needs non-NULL, compare_tree_int too.
>
>> > + /* Try to merge ranges. */
>> > + for (first = i; i < length; i++)
>> > + {
>> > + tree low = ranges[i].low;
>> > + tree high = ranges[i].high;
>> > + int in_p = ranges[i].in_p;
>> > + bool strict_overflow_p = ranges[i].strict_overflow_p;
>> > +
>> > + for (j = i + 1; j < length; j++)
>> > + {
>>
>> That looks quadratic - do we want to limit this with a --param, simply
>> partitioning the array into quadratic chunks?
>
> This isn't quadratic (except in the hypothetical case
> where all merge_ranges calls would succeed, but then
> build_range_test would fail). In the likely case where if range merges
> succeed then update_range_test succeeds too this is just linear (plus the
> qsort before that which isn't linear though).
> So perhaps:
> + if (j > i + 1
> + && update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode,
> + ops, ranges[i].exp, in_p, low, high,
> + strict_overflow_p))
> + {
> + i = j - 1;
> + any_changes = true;
> + }
> could be
> + if (j > i + 1)
> + {
> + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1,
> opcode,
> + ops, ranges[i].exp, in_p, low, high,
> + strict_overflow_p))
> + any_changes = true;
> + i = j - 1;
> + }
> (then it isn't quadratic), or could try a few times before giving up:
> + if (j > i + 1)
> + {
> + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1,
> opcode,
> + ops, ranges[i].exp, in_p, low, high,
> + strict_overflow_p))
> + {
> + any_changes = true;
> + i = j - 1;
> + }
> + else if (update_fail_count == 64)
> + i = j - 1;
> + else
> + update_fail_count = 0;
> + }
> where int update_fail_count = 0; would be after the
> bool strict_overflow_p = ranges[i].strict_overflow_p;
> line in the outer loop.
>
>> > + if (ranges[i].exp != ranges[j].exp)
>> > + break;
>>
>> Or isn't it too bad because of this check?
>
> The above limits the chunks to a particular SSA_NAME. Within each
> chunk for the same SSA_NAME, the ranges are sorted in a way that
> merge_ranges will likely succeed, so it isn't quadratic unless
> update_range_test fails (see above).
>
> BTW, the second loop (the one that attempts to optimize
> x == 1 || x == 3 into (x & ~2) == 1 etc. is quadratic, which is why
> there is
> for (j = i + 1; j < length && j < i + 64; j++)
> don't think it is a limit people will often run into and thus I don't
> think it is worth adding a --param= for that.
Ok.
>> > + if (!merge_ranges (&in_p, &low, &high, in_p, low, high,
>> > + ranges[j].in_p, ranges[j].low,
>> > ranges[j].high))
>> > + break;
>>
>> And this early out? I suppose some comment on why together with
>> the sorting this is of limited complexity would help.
>>
>> > @@ -2447,6 +2864,9 @@ reassociate_bb (basic_block bb)
>> > optimize_ops_list (rhs_code, &ops);
>> > }
>> >
>> > + if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR)
>>
>> I think we want to check that the type is boolean here, so we don't setup
>> the machinery needlessly?
>
> It is boolean only in some testcases, the is_bool stuff discussed at the
> beginning above was originally just an early return
> if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
> return;
> before the loop, but it turned out that often the type of the | operands
> is integer, with either bool casted to integer, or with the type of EQ_EXPR
> etc. being integer instead of bool.
Really? The type of EQ_EXPR should be always either BOOLEAN_TYPE
or INTEGRAL_TYPE_P with TYPE_PRECISION == 1. That's what
the gimple verifier checks. Or do you mean that fold introduces these
kind of types during range-test simplification?
Richard.
>
> Jakub
>