On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > > On 1/13/23 16:54, Jakub Jelinek wrote: > > On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches > > wrote: > >> fold_range() already invokes wi_fold_in_parts to try to get more refined > >> information. If the subranges are quite small, it will do each individual > >> calculation and combine the results. > >> > >> x * y with x = [1,3] and y = [1,3] is broken down and we calculate each > >> possibility and we end up with [1,4][6,6][9,9] instead of [1,9] > >> > >> We limit this as the time is between quadratic to exponential depending on > >> the number of elements in x and y. > >> > >> If we also check the relation and determine that x == y, we don't need to > >> worry about that growth as this process is linear. The above case will be > >> broken down to just 1*1, 2*2 and 3*3, resulting in a range of [1, > >> 1][4,4][9,9]. > >> > >> In the testcase, it happens to be the right_shift operation, but this > >> solution is generic and applies to all range-op operations. I added a > >> testcase which checks >>, *, + and %. > >> > >> I also arbitrarily chose 8 elements as the limit for breaking down > >> individual operations. The overall compile time change for this is > >> negligible. > >> > >> Although this is a regression fix, it will affect all operations where x == > >> y, which is where my initial hesitancy arose. > >> > >> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions. OK > >> for > >> trunk? > > Will defer to Aldy, just some nits. > Did you mean Richi?
I suppose Aldy, since it's a regression fix it's OK even during stage4. I do have a comment as well though, you do + // If op1 and op2 are equivalences, then we don't need a complete cross + // product, just pairs of matching elements. + if (relation_equiv_p (rel) && lh == rh && num_lh <= 16) + { + int_range_max tmp; + r.set_undefined (); + for (unsigned x = 0; x < num_lh; ++x) + { + wide_int lh_lb = lh.lower_bound (x); + wide_int lh_ub = lh.upper_bound (x); + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); and that does + widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)), + widest_int::from (lh_lb, TYPE_SIGN (type))); + // if there are 1 to 8 values in the LH range, split them up. + r.set_undefined (); + if (lh_range >= 0 && lh_range <= 7) + { + for (unsigned x = 0; x <= lh_range; x++) which in total limits the number of sub-ranges in the output but in an odd way. It's also all-or-nothing. IIRC there's a hard limit on the number of sub-ranges in the output anyway via int_range<N>, so why not use that and always do the first loop over the sub-ranges of the inputs and the second loop over the range members but stop when we reach N-1 and then use wi_fold on the remainds? Your above code suggests we go up to 112 sub-ranges and once we'd reach 113 we'd fold down to a single. Maybe my "heuristic" wouldn't be much better, but then somehow breaking the heuristic down to a single magic number would be better, esp. since .union_ will undo some of the breakup when reaching N? > > > >> + // if there are 1 to 8 values in the LH range, split them up. > >> + r.set_undefined (); > >> + if (lh_range >= 0 && lh_range <= 7) > >> + { > >> + unsigned x; > >> + for (x = 0; x <= lh_range; x++) > > Nothing uses x after the loop, so why not > > for (unsigned x = 0; x <= lh_range; x++) > > instead? > > Just old habits. > > > >> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type, > >> unsigned num_lh = lh.num_pairs (); > >> unsigned num_rh = rh.num_pairs (); > >> > >> + // If op1 and op2 are equivalences, then we don't need a complete cross > >> + // product, just pairs of matching elements. > >> + if (relation_equiv_p (rel) && (lh == rh)) > > The ()s around lh == rh look superfluous to me. > Yeah I just found it marginally more readable, but it is superfluous > >> + { > >> + int_range_max tmp; > >> + r.set_undefined (); > >> + for (unsigned x = 0; x < num_lh; ++x) > > fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something > > like that be there for this case too? > > I mean, every wi_fold_in_parts_equiv can result in 8 subranges, > > but num_lh could be up to 255 here, it is true it is linear and union_ > > should merge excess ones, but still I wonder if some larger num_lh upper > > bound like 20 or 32 wouldn't be useful. Up to you... > fold_range has the num_lh * num_rh limit because it was > quadratic/exponential and changes rapidly. Since this was linear based > on the number of sub ranges I didn't think it would matter much, but > sure, we can put a similar limit on it.. 16 seems reasonable. > >> + { > >> + wide_int lh_lb = lh.lower_bound (x); > >> + wide_int lh_ub = lh.upper_bound (x); > >> + wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub); > >> + r.union_ (tmp); > >> + if (r.varying_p ()) > >> + break; > >> + } > >> + op1_op2_relation_effect (r, type, lh, rh, rel); > >> + update_known_bitmask (r, m_code, lh, rh); > >> + return true; > >> + } > >> + > > Jakub > > > Updated patch attached... I'll run it through testing whe the current > one is done. > > > Andrew