https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70359
--- Comment #34 from rguenther at suse dot de <rguenther at suse dot de> --- On Thu, 8 Mar 2018, aldyh at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70359 > > --- Comment #32 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- > As mentioned in the previous comment, the proposed patch brings down the count > from 116 to 108 on ARM, but is shy of the desired 96. > > The missing bytes can be attributed to forwprop folding this (IL expanded for > illustration): > > if (ui_7 / 10 != 0) > > into: > > if (ui_7 > 9) > > More specifically, changing this: > > # ui_7 = PHI <ui_13(2), ui_21(3)> > ... > ui_21 = ui_7 / 10; > if (ui_21 != 0) > > into: > > # ui_7 = PHI <ui_13(2), ui_21(3)> > ... > ui_21 = ui_7 / 10; > if (ui_7 > 9) > > Inhibiting this optimization brings down the byte count to 92 which is even > lower than our 96 boogie man, so perhaps worth pursuing. (Assumes my proposed > patch is also applied.) I'm no expert, but isn't a EQ/NE with 0 preferable > than a <> with a non-zero? > > If so, should we restrict the folding somewhat, or clean this up after the > fact? > > For reference, the folding (in forwprop) is due to this match.pd pattern: > > /* X / C1 op C2 into a simple range test. */ > > ...though eliminating it causes another pattern to pick up the slack and do > the > same: > > /* Transform: > * (X / Y) == 0 -> X < Y if X, Y are unsigned. > * (X / Y) != 0 -> X >= Y, if X, Y are unsigned. > */ > > Eliminating both patterns "fixes" the problem. > > Suggestions welcome :). In other places where this happened we add single_use () checks to the patterns. The :s isn't really effective for patterns that simplify into "simple" expressions. A patch doing that to the two patterns in question would be OK I think. Sth like Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 258359) +++ gcc/match.pd (working copy) @@ -1290,11 +1290,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* X / C1 op C2 into a simple range test. */ (for cmp (simple_comparison) (simplify - (cmp (trunc_div:s @0 INTEGER_CST@1) INTEGER_CST@2) + (cmp (trunc_div:s@3 @0 INTEGER_CST@1) INTEGER_CST@2) (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) && integer_nonzerop (@1) && !TREE_OVERFLOW (@1) - && !TREE_OVERFLOW (@2)) + && !TREE_OVERFLOW (@2) + && single_use (@3)) (with { tree lo, hi; bool neg_overflow; enum tree_code code = fold_div_compare (cmp, @1, @2, &lo, &hi, &neg_overflow); } tough there is at least one constant simplification result where we shouldn't apply this restriction so moving the single_use check to multiple places below or factoring things a bit would be appropriate. Similar for the other pattern then.