On Mon, Jul 30, 2018 at 7:47 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> [Sorry, somehow missed this till now]
>
> Richard Biener <richard.guent...@gmail.com> writes:
> > On Mon, Jul 23, 2018 at 5:05 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Marc Glisse <marc.gli...@inria.fr> writes:
> >> > On Fri, 20 Jul 2018, Richard Sandiford wrote:
> >> >
> >> >> --- gcc/match.pd     2018-07-18 18:44:22.565914281 +0100
> >> >> +++ gcc/match.pd     2018-07-20 11:24:33.692045585 +0100
> >> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >> >>    (if (inverse_conditions_p (@0, @2)
> >> >>         && element_precision (type) == element_precision (op_type))
> >> >>     (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> >> +
> >> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> >> +   expressions like:
> >> >> +
> >> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> >> +
> >> >> +   If pointers are known not to wrap, B checks whether @1 bytes 
> >> >> starting at
> >> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 
> >> >> bytes.
> >> >> +   A is more efficiently tested as:
> >> >> +
> >> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> >> +
> >> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> >> +   with @1 - 1.  */
> >> >> +(for ior (truth_orif truth_or bit_ior)
> >> >> + (for cmp (le lt)
> >> >> +  (simplify
> >> >> +   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> >> +    (cmp (pointer_plus:s @2 @1) @0))
> >> >
> >> > Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need 
> >> > some
> >> > care with "cmp == LE_EXPR" below)
> >> > Do you want :s on cmp as well?
> >> >
> >> >> +   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >> >
> >> > Don't you want TYPE_OVERFLOW_UNDEFINED?
> >>
> >> Thanks, fixed below.  Think the cmp == LE_EXPR stuff is still ok with :c,
> >> since the generated code sets cmp to LE_EXPR when matching GE_EXPR.
> >>
> >> Tested as before.  OK to install?
> >>
> >> Richard
> >>
> >>
> >> 2018-07-23  Richard Sandiford  <richard.sandif...@arm.com>
> >>
> >> gcc/
> >>         * match.pd: Optimise pointer range checks.
> >>
> >> gcc/testsuite/
> >>         * gcc.dg/pointer-range-check-1.c: New test.
> >>         * gcc.dg/pointer-range-check-2.c: Likewise.
> >>
> >> Index: gcc/match.pd
> >> ===================================================================
> >> --- gcc/match.pd        2018-07-23 15:56:47.000000000 +0100
> >> +++ gcc/match.pd        2018-07-23 15:58:33.480269844 +0100
> >> @@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>     (if (inverse_conditions_p (@0, @2)
> >>          && element_precision (type) == element_precision (op_type))
> >>      (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> >> +
> >> +/* For pointers @0 and @2 and unsigned constant offset @1, look for
> >> +   expressions like:
> >> +
> >> +   A: (@0 + @1 < @2) | (@2 + @1 < @0)
> >> +   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
> >> +
> >> +   If pointers are known not to wrap, B checks whether @1 bytes starting 
> >> at
> >> +   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 
> >> bytes.
> >> +   A is more efficiently tested as:
> >> +
> >> +   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
> >> +
> >> +   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
> >> +   with @1 - 1.  */
> >> +(for ior (truth_orif truth_or bit_ior)
> >> + (for cmp (le lt)
> >> +  (simplify
> >> +   (ior (cmp:cs (pointer_plus:s @0 INTEGER_CST@1) @2)
> >> +       (cmp:cs (pointer_plus:s @2 @1) @0))
> >> +   (if (!flag_trapv && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
> >
> > no need to check flag_trapv, pointer arithmetic is not covered by -ftrapv.
>
> This was there because we're converting pointer arithmetic into integer
> arithmetic, and -ftrapv could cause that new integer code to trap.
> TYPE_OVERFLOW_UNDEFINED says that pointer overflow is undefined,
> but it seemed bad form to make it actually trap, especially when the
> above does some reassociation too.

Ok, but then maybe check TYPE_OVERFLOW_WRAPS () on the type
you are doing the arithmetic on (sizetype) instead?

>
> >> +    /* Convert the B form to the A form.  */
> >> + (with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 :
> >> 0); }
> >> +     /* Always fails for negative values.  */
> >> + (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION
> >> (sizetype))
> >> +      /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
> >> +        tree_swap_operands_p pick a canonical order.  */
> >
> > gimple_resimplify takes care of that - well, it doesn't since minus isn't
> > commutative...  I guess you get better CSE later when doing this thus ok,
> > but it does look a bit off here ;)
> >
> > I think you shouldn't use 'sizetype' without guarding this whole thing
> > with TYPE_PRECISION (sizetype) == TYPE_PRECISION (TREE_TYPE (@0)).
>
> OK, hadn't realised that could be false.  Would building the appropriate
> unsigned type be OK without the condition, or does it need to be sizetype?

That would probably work but it might be dead slow on those targets that choose
to have a lower-precision sizetype (ISTR some have 24bit pointers but 16bit
sizetype because there's no arithmetic for 24bit but in offsetted addressing).
So the transform is likely not a win for those.

> > Since the original IL performs an ordered compare of two eventually 
> > unrelated
> > pointers (or pointers adjusted in a way that crosses object
> > boundaries) (uh... ;))
> > I wonder if you can use POINTER_DIFF_EXPR here to avoid the sizetype
> > conversion?  Since POINTER_PLUS_EXPR offsets are supposed to be
> > interpreted "signed" that should also handle negative offsets correctly 
> > then?
>
> The transform isn't valid when @1 is negative though, so I think we
> need that check either way.

+     /* Always fails for negative values.  */
+     (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))

shouldn't that be wi::min_precision (off * 2, UNSIGNED) <=
TYPE_PRECISION (sizetype)?

> I could use POINTER_DIFF_EXPR and convert to unsigned though, if that
> seems better.  That might also allow us to CSE the retained
> POINTER_PLUS_EXPRs.

Yeah, that sounds better then.

> > Also, when @2 == @0 + (@1+1) then the original condition is true but
> > ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2) is not?
> >    (sizetype) @0 - (sizetype) (@0 + @1 + 1) + @1 > @1 * 2
> > -> -1 > @1 * 2
> >
> > which is false.  So I can't really see how you can apply this transform in
> > general (it would be fine for generating alias checks but a bit more
> > pessimistic).
> >
> > But maybe I am missing something?
>
> It relies on sizetype being unsigned: (sizetype)-1 > @1 * 2 is true.

Hmm, so mathematically this is

  (@0 - @2) % modreduce + @1 > @1 * 2

then, but I don't want to think too much about this since Marc didn't
object here ;)

Thanks,
Richard.

> Thanks,
> Richard
>
> > Richard.
> >
> >> +      (with { tree ptr1 = @0, ptr2 = @2;
> >> +             if (tree_swap_operands_p (ptr1, ptr2))
> >> +               std::swap (ptr1, ptr2); }
> >> +       (gt (plus (minus (convert:sizetype { ptr1; })
> >> +                       (convert:sizetype { ptr2; }))
> >> +                { wide_int_to_tree (sizetype, off); })
> >> +          { wide_int_to_tree (sizetype, off * 2); }))))))))
> >> Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
> >> ===================================================================
> >> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> >> +++ gcc/testsuite/gcc.dg/pointer-range-check-1.c 2018-07-23
> >> 15:58:33.480269844 +0100
> >> @@ -0,0 +1,37 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fno-ipa-icf -fdump-tree-optimized" } */
> >> +
> >> +/* All four functions should be folded to:
> >> +
> >> +   ((sizetype) a - (sizetype) b + 15) < 30.  */
> >> +
> >> +_Bool
> >> +f1 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f2 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +_Bool
> >> +f3 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f4 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* - [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]*\ > [^\n]*;} 4
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-not {=[^\n]*\ < [^\n]*;} "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { > 30} 4 "optimized" } } */
> >> Index: gcc/testsuite/gcc.dg/pointer-range-check-2.c
> >> ===================================================================
> >> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> >> +++ gcc/testsuite/gcc.dg/pointer-range-check-2.c 2018-07-23
> >> 15:58:33.480269844 +0100
> >> @@ -0,0 +1,31 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fno-ipa-icf -fwrapv-pointer
> >> -fdump-tree-optimized" } */
> >> +
> >> +_Bool
> >> +f1 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f2 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +_Bool
> >> +f3 (char *a, char *b)
> >> +{
> >> +  return (a + 16 <= b) || (b + 16 <= a);
> >> +}
> >> +
> >> +_Bool
> >> +f4 (char *a, char *b)
> >> +{
> >> +  return (a + 15 < b) || (b + 15 < a);
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not { = [^\n]* - [^\n]*;} "optimized" } } 
> >> */
> >> +/* { dg-final { scan-tree-dump-times { = [^\n]* \+ [^\n]*;} 8
> >> "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 15} 4 "optimized" } } */
> >> +/* { dg-final { scan-tree-dump-times { \+ 16} 4 "optimized" } } */

Reply via email to