Re: Fold pointer range checks with equal spans

Richard Biener Wed, 01 Aug 2018 04:59:21 -0700

On Wed, Aug 1, 2018 at 12:25 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Biener <richard.guent...@gmail.com> writes:
> > On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> [Sorry, somehow missed this till now]
> >>
> >> Richard Biener <richard.guent...@gmail.com> writes:
> >> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
> >> > <richard.sandif...@arm.com> wrote:
> >> >>
> >> >> Marc Glisse <marc.gli...@inria.fr> writes:
> >> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
> >> >> >
> >> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
> >> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
> >> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >> >>    (if (inverse_conditions_p (@0, @2)
> >> >> >>         && element_precision (type) == element_precision (op_type))
> >> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type 
> >> >> >> @1)))))))
> >> >> >> +
> >> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> >> +   expressions like:
> >> >> >> +
> >> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> >> +
> >> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes 
> >> >> >> starting at
> >> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 
> >> >> >> 1 bytes.
> >> >> >> +   A is more efficiently tested as:
> >> >> >> +
> >> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> >> +
> >> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 
> >> >> >> replaced
> >> >> >> +   with @1 - 1.  */
> >> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> >> + (for cmp (le lt)
> >> >> >> +  (simplify
> >> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
> >> >> >
> >> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may
> >> >> > need some
> >> >> > care with "cmp == LE_EXPR" below)
> >> >> > Do you want :s on cmp as well?
> >> >> >
> >> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >> >> >
> >> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
> >> >>
> >> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with 
> >> >> :c,
> >> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
> >> >>
> >> >> Tested as before.  OK to install?
> >> >>
> >> >> Richard
> >> >>
> >> >>
> >> >> 2018-07-23  Richard Sandiford  <richard.sandif...@arm.com>
> >> >>
> >> >> gcc/
> >> >>         * match.pd: Optimise pointer range checks.
> >> >>
> >> >> gcc/testsuite/
> >> >>         * gcc.dg/pointer-range-check-1.c: New test.
> >> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
> >> >>
> >> >> Index: gcc/match.pd
> >> >> ===================================================================
> >> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
> >> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >>     (if (inverse_conditions_p (@0, @2)
> >> >>          && element_precision (type) == element_precision (op_type))
> >> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> >> +
> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> +   expressions like:
> >> >> +
> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> +
> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes 
> >> >> starting at
> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 
> >> >> bytes.
> >> >> +   A is more efficiently tested as:
> >> >> +
> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> +
> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> >> +   with @1 - 1.  */
> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> + (for cmp (le lt)
> >> >> +  (simplify
> >> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
> >> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> >> >
> >> > no need to check flag_trapv, pointer arithmetic is not covered by 
> >> > -ftrapv.
> >>
> >> This was there because we're converting pointer arithmetic into integer
> >> arithmetic, and -ftrapv could cause that new integer code to trap.
> >> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
> >> but it seemed bad form to make it actually trap, especially when the
> >> above does some reassociation too.
> >
> > Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
> > you are doing the arithmetic on (sizetype) instead?
>
> OK, done below.
>
> >> >> +    /* Convert the B form to the A form.  */
> >> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
> >> >> 0); }
> >> >> +     /* Always fails for negative values.  */
> >> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
> >> >> (sizetype))
> >> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> >> >> +        tree_swap_operands_p pick a canonical order.  */
> >> >
> >> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
> >> > commutative...  I guess you get better CSE later when doing this thus ok,
> >> > but it does look a bit off here ;)
> >> >
> >> > I think you shouldn't use 'sizetype' without guarding this whole thing
> >> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
> >>
> >> OK, hadn't realised that could be false.  Would building the appropriate
> >> unsigned type be OK without the condition, or does it need to be sizetype?
> >
> > That would probably work but it might be dead slow on those targets that 
> > choose
> > to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
> > sizetype because there's no arithmetic for 24bit but in offsetted 
> > addressing).
> > So the transform is likely not a win for those.
>
> Ah, yeah.  The version below adds the == test.
>
> >> > Since the original IL performs an ordered compare of two eventually
> >> > unrelated
> >> > pointers (or pointers adjusted in a way that crosses object
> >> > boundaries) (uh... ;))
> >> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
> >> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
> >> > interpreted "signed" that should also handle negative offsets
> >> > correctly then?
> >>
> >> The transform isn't valid when @1 is negative though, so I think we
> >> need that check either way.
> >
> > +     /* Always fails for negative values.  */
> > +     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION 
> > (sizetype))
> >
> > shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
> > TYPE_PRECISION (sizetype)?
>
> Gah, yes :-(
>
> >> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
> >> seems better.  That might also allow us to CSE the retained
> >> POINTER_PLUS_EXPRs.
> >
> > Yeah, that sounds better then.
>
> And the CSEing did happen, so we now get the "ideal" sequence for the
> original alias checks:
>
>         add     x0, x0, 15
>         sub     x6, x0, x1
>         sub     x5, x0, x2
>         cmp     x6, 30
>         ccmp    x5, 30, 0, hi
>
> whereas the original patch had an extra add.
>
> >> > Also, when @2 == @0 + (@1+1) then the original condition is true but
> >> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
> >> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
> >> > -> -1 > @1 * 2
> >> >
> >> > which is false.  So I can't really see how you can apply this transform 
> >> > in
> >> > general (it would be fine for generating alias checks but a bit more
> >> > pessimistic).
> >> >
> >> > But maybe I am missing something?
> >>
> >> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.
> >
> > Hmm, so mathematically this is
> >
> >   (@0 - @2) % modreduce + @1 > @1 * 2
>
> Probably not % if you operating over Z, since if @0 - @2 = -@1 * 2
> (and -@1 * 2 % modreduce == -@1 * 2), the condition above will be false
> while we want it to be true.  But...
>
> > then, but I don't want to think too much about this since Marc didn't
> > object here ;)
>
> ...we start out converting the range check for A to:
>
>    !(-@1 <= @0 - @2 <= @1)
>
> i.e. @1 + 1 bytes at @0 and @2 are free from overlap.  Then we just
> apply the usual trick:
>
>    A <= X <= B -> (unsigned) (X - A) <= (unsigned) (B - A) when A <= B
>
> Substituting @0 - @2 for X, -@1 for A and @1 for B gives:
>
>    !((unsigned) (@0 - @2 + @1) <= (unsigned) (@1 * 2))
> ->   (unsigned) (@0 - @2 + @1) >  (unsigned) (@1 * 2)
>
> Then the B case is just an off-by-one variant.
>
> Patch tested as before.


OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2018-08-01  Richard Sandiford  <richard.sandif...@arm.com>
>
> gcc/
>         * match.pd: Optimise pointer range checks.
>
> gcc/testsuite/
>         * gcc.dg/pointer-range-check-1.c: New test.
>         * gcc.dg/pointer-range-check-2.c: Likewise.
>
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2018-07-24 19:04:14.985363724 +0100
> +++ gcc/match.pd        2018-08-01 10:55:10.695766411 +0100
> @@ -4937,3 +4937,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> +
> +/* For pointers @0 and @2 and nonnegative constant offset @1, look for
> +   expressions like:
> +
> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> +
> +   If pointers are known not to wrap, B checks whether @1 bytes starting
> +   at @0 and @2 do not overlap, while A tests the same thing for @1 + 1
> +   bytes.  A is more efficiently tested as:
> +
> +   A: (sizetype) (@0 + @1 - @2) > @1 * 2
> +
> +   The equivalent expression for B is given by replacing @1 with @1 - 1:
> +
> +   B: (sizetype) (@0 + (@1 - 1) - @2) > (@1 - 1) * 2
> +
> +   @0 and @2 can be swapped in both expressions without changing the result.
> +
> +   The folds rely on sizetype's being unsigned (which is always true)
> +   and on its being the same width as the pointer (which we have to check).
> +
> +   The fold replaces two pointer_plus expressions, two comparisons and
> +   an IOR with a pointer_plus, a pointer_diff, and a comparison, so in
> +   the best case it's a saving of two operations.  The A fold retains one
> +   of the original pointer_pluses, so is a win even if both pointer_pluses
> +   are used elsewhere.  The B fold is a wash if both pointer_pluses are
> +   used elsewhere, since all we end up doing is replacing a comparison with
> +   a pointer_plus.  We do still apply the fold under those circumstances
> +   though, in case applying it to other conditions eventually makes one of 
> the
> +   pointer_pluses dead.  */
> +(for ior (truth_orif truth_or bit_ior)
> + (for cmp (le lt)
> +  (simplify
> +   (ior (cmp:cs (pointer_plus@3 @0 INTEGER_CST@1) @2)
> +       (cmp:cs (pointer_plus@4 @2 @1) @0))
> +   (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +       && TYPE_OVERFLOW_WRAPS (sizetype)
> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (sizetype))
> +    /* Calculate the rhs constant.  */
> +    (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0);
> +           offset_int rhs = off * 2; }
> +     /* Always fails for negative values.  */
> +     (if (wi::min_precision (rhs, UNSIGNED) <= TYPE_PRECISION (sizetype))
> +      /* Since the order of @0 and @2 doesn't matter, let 
> tree_swap_operands_p
> +        pick a canonical order.  This increases the chances of using the
> +        same pointer_plus in multiple checks.  */
> +      (with { bool swap_p = tree_swap_operands_p (@0, @2);
> +             tree rhs_tree = wide_int_to_tree (sizetype, rhs); }
> +       (if (cmp == LT_EXPR)
> +       (gt (convert:sizetype
> +            (pointer_diff:ssizetype { swap_p ? @4 : @3; }
> +                                    { swap_p ? @0 : @2; }))
> +           { rhs_tree; })
> +       (gt (convert:sizetype
> +            (pointer_diff:ssizetype
> +             (pointer_plus { swap_p ? @2 : @0; }
> +                           { wide_int_to_tree (sizetype, off); })
> +             { swap_p ? @0 : @2; }))
> +           { rhs_tree; })))))))))
> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
> ===================================================================
> --- /dev/null   2018-07-26 10:26:13.137955424 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c        2018-08-01 
> 10:55:10.695766411 +0100
> @@ -0,0 +1,37 @@
> +/* { dg-do compile { target { ilp32 || lp64 } } } */
> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
> +
> +/* All four functions should be folded to:
> +
> +   (sizetype) (a + 15 - b) < 30.  */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) | (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) | (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4 "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4 "optimized" } 
> } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4 "optimized" } 
> } */
> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
> ===================================================================
> --- /dev/null   2018-07-26 10:26:13.137955424 +0100
> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c        2018-08-01 
> 10:55:10.695766411 +0100
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer -fdump-tree-optimized" } */
> +
> +_Bool
> +f1 (char *a, char *b)
> +{
> +  return (a + 16 <= b) || (b + 16 <= a);
> +}
> +
> +_Bool
> +f2 (char *a, char *b)
> +{
> +  return (a + 15 < b) || (b + 15 < a);
> +}
> +
> +_Bool
> +f3 (char *a, char *b)
> +{
> +  return (a + 16 <= b) | (b + 16 <= a);
> +}
> +
> +_Bool
> +f4 (char *a, char *b)
> +{
> +  return (a + 15 < b) | (b + 15 < a);
> +}
> +
> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8 "optimized" } 
> } */
> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */

Re: Fold pointer range checks with equal spans

Reply via email to