Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.

Richard Biener via Gcc-patches Sun, 15 Jan 2023 23:19:59 -0800

On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacl...@redhat.com> wrote:
>
>
> On 1/13/23 16:54, Jakub Jelinek wrote:
> > On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches 
> > wrote:
> >> fold_range() already invokes wi_fold_in_parts to try to get more refined
> >> information. If the subranges are quite small, it will do each individual
> >> calculation and combine the results.
> >>
> >> x * y with x = [1,3] and y = [1,3]  is broken down and we calculate each
> >> possibility and we end up with [1,4][6,6][9,9] instead of [1,9]
> >>
> >> We limit this as the time is between quadratic to exponential depending on
> >> the number of elements in x and y.
> >>
> >> If we also check the relation and determine that x == y, we don't need to
> >> worry about that growth as this process is linear.  The above case will be
> >> broken down to just  1*1, 2*2 and 3*3, resulting in a range of [1,
> >> 1][4,4][9,9].
> >>
> >>   In the testcase, it happens to be the right_shift operation, but this
> >> solution is generic and applies to all range-op operations. I added a
> >> testcase which checks >>, *, + and %.
> >>
> >> I also arbitrarily chose 8 elements as the limit for breaking down
> >> individual operations.  The overall compile time change for this is
> >> negligible.
> >>
> >> Although this is a regression fix, it will affect all operations where x ==
> >> y, which is where my initial hesitancy arose.
> >>
> >> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK 
> >> for
> >> trunk?
> > Will defer to Aldy, just some nits.
> Did you mean Richi?


I suppose Aldy, since it's a regression fix it's OK even during stage4.

I do have a comment as well though, you do

+  // If op1 and op2 are equivalences, then we don't need a complete cross
+  // product, just pairs of matching elements.
+  if (relation_equiv_p (rel) && lh == rh && num_lh <= 16)
+    {
+      int_range_max tmp;
+      r.set_undefined ();
+      for (unsigned x = 0; x < num_lh; ++x)
+       {
+         wide_int lh_lb = lh.lower_bound (x);
+         wide_int lh_ub = lh.upper_bound (x);
+         wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);

and that does

+  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
+                                widest_int::from (lh_lb, TYPE_SIGN (type)));
+  // if there are 1 to 8 values in the LH range, split them up.
+  r.set_undefined ();
+  if (lh_range >= 0 && lh_range <= 7)
+    {
+      for (unsigned x = 0; x <= lh_range; x++)

which in total limits the number of sub-ranges in the output but in an
odd way.  It's also all-or-nothing.  IIRC there's a hard limit on the
number of sub-ranges in the output anyway via int_range<N>, so
why not use that and always do the first loop over the sub-ranges
of the inputs and the second loop over the range members but
stop when we reach N-1 and then use wi_fold on the remainds?

Your above code suggests we go up to 112 sub-ranges and once
we'd reach 113 we'd fold down to a single.

Maybe my "heuristic" wouldn't be much better, but then somehow
breaking the heuristic down to a single magic number would be
better, esp. since .union_ will undo some of the breakup when
reaching N?

> >
> >> +  // if there are 1 to 8 values in the LH range, split them up.
> >> +  r.set_undefined ();
> >> +  if (lh_range >= 0 && lh_range <= 7)
> >> +    {
> >> +      unsigned x;
> >> +      for (x = 0; x <= lh_range; x++)
> > Nothing uses x after the loop, so why not
> >        for (unsigned x = 0; x <= lh_range; x++)
> > instead?
>
> Just old habits.
>
>
> >> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type,
> >>     unsigned num_lh = lh.num_pairs ();
> >>     unsigned num_rh = rh.num_pairs ();
> >>
> >> +  // If op1 and op2 are equivalences, then we don't need a complete cross
> >> +  // product, just pairs of matching elements.
> >> +  if (relation_equiv_p (rel) && (lh == rh))
> > The ()s around lh == rh look superfluous to me.
> Yeah I just found it marginally more readable, but it is superfluous
> >> +    {
> >> +      int_range_max tmp;
> >> +      r.set_undefined ();
> >> +      for (unsigned x = 0; x < num_lh; ++x)
> > fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something
> > like that be there for this case too?
> > I mean, every wi_fold_in_parts_equiv can result in 8 subranges,
> > but num_lh could be up to 255 here, it is true it is linear and union_
> > should merge excess ones, but still I wonder if some larger num_lh upper
> > bound like 20 or 32 wouldn't be useful.  Up to you...
> fold_range has the num_lh * num_rh limit because it was
> quadratic/exponential and changes rapidly. Since this was linear based
> on the number of sub ranges I didn't think it would matter much, but
> sure, we can put a similar limit on it.. 16 seems reasonable.
> >> +    {
> >> +      wide_int lh_lb = lh.lower_bound (x);
> >> +      wide_int lh_ub = lh.upper_bound (x);
> >> +      wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
> >> +      r.union_ (tmp);
> >> +      if (r.varying_p ())
> >> +        break;
> >> +    }
> >> +      op1_op2_relation_effect (r, type, lh, rh, rel);
> >> +      update_known_bitmask (r, m_code, lh, rh);
> >> +      return true;
> >> +    }
> >> +
> >       Jakub
> >
> Updated patch attached...  I'll run it through testing whe the current
> one is done.
>
>
> Andrew

Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.

Reply via email to