On Wed, Apr 22, 2026 at 5:12 PM Daniel Henrique Barboza
<[email protected]> wrote:
>
> Ping
>
> On 3/20/2026 2:02 PM, Daniel Henrique Barboza wrote:
> > From: Daniel Barboza <[email protected]>
> >
> > PR101179 requires a combination of patterns that transforms the code to
> > better utilize other existing patterns (mostly 122608).
> >
> > First, if we have a pattern that uses a zero_one that results in two
> > constants, and the constants are then used in a subsequent OP, we want
> > the constants to be explicit.  The reason is that 122608 has code that
> > will then distribute the OP along the constants. Some frontends already
> > do that, but not always.  E.g: given this C code:
> >
> >   int test(int y) { return y % (4 << ((y % 100 == 0) * 2)); }
> >
> > The C frontend will convert it to:
> >
> >   ;; enabled by -tree-original
> >   { return yD.4597 % (yD.4597 % 100 == 0 ? 16 : 4); }
> >
> > Which is what we want.  But this next C code will not be converted by the
> > frontend.  I haven't dig into it but my educated guess is that the extra
> > cast before the zero_one val is the deal breaker:
> >
> >   _Bool x = y % 100 == 0;
> >   return y % (4 << (x * 2));
> >
> >   ;; enabled by -tree-original
> >   {
> >     _BoolD.4579 xD.4596 = yD.4593 % 100 == 0;
> >     return yD.4593 % (4 << (intD.7) xD.4596 * 2);
> >   }
> >
> > I haven't dig into it but my educated guess is that the extra cast
> > before the zero_one val is not helping here.  But note that there's no
> > guarantee that all frontends will behave this way, and there's no
> > guarantee that we might introduce the pattern ourselves for other
> > unrelated optimizations that will affect these.  To be certain we're
> > introducing some of the most common patterns that generates constants,
> > ignoring cases where we want the pattern to be left untouched (e.g.
> > WIDEN_MULT_PLUS_EXPR).
> >
> > Second: in a case where we have a comparison with the same RHS in two
> > legs of a conditional, for example:
> >
> >     bb3:
> >     _14 = y_9(D) & 15;
> >     _3 = _14 == 0;
> >     goto <bb 5>;
> >
> >     bb4:
> >     _13 = y_9(D) & 3;
> >     _6 = _13 == 0;
> >
> >     bb5:
> >     # _5 = PHI <_6(4), _3(3)>
> >     _7 = (intD.7) _5;
> >
> > We want to move the comparison == 0 to the merge block, allowing other
> > optimizations that operate in single statement blocks to take place.
> > This gimple can be produced by patterns such as:
> >
> >    return x ? (y % 16) == 0 : (y % 4) == 0;

I don't think we want to handle this via match.pd patterns.  In PHI-OPT
we for example have code to hoist/sink conversions, in sinking there's
code to sink common stores.

So I'd rather view this as opportunity to enhance
factor_out_conditional_operation
from phiopt.

The other cases are an opportunity for the reverse transform, PRE handles
some of them, phiprop as well.  I think that

  <bb 2> :
  if (x_4(D) != 0)
    goto <bb 4>; [INV]
  else
    goto <bb 3>; [INV]

  <bb 3> :

  <bb 4> :
  # iftmp.0_3 = PHI <16(2), 4(3)>
  _1 = y_7(D) % iftmp.0_3;

would fit a new PHI-OPT kind similar to factor_out_conditional_operation,
but handling the case where either the resulting operations are a lot
cheaper or on one arm the constant is the neutral element (PRE handles
that case).

> >
> > And finally, w.r.t mod operations, we want to always convert them to
> > something lighter (like a lshift or bit_and).  For zero comparisons with
> > a integer_pow2 operand we want to do a bit_and instead, and the frontend
> > will convert it for us regardless of SSA_NAME sign. But the C frontend
> > won't convert a mod operand wrapped into a if, e.g. this C code:
> >
> >    return y % (x ? 16 : 4) == 0;
> >
> > We're adding a pattern that handles this case, turning this into:
> >
> >    return ((xD.4590 != 0 ? 15 : 3) & yD.4589) == 0;
> >
> > With all these patterns we will optimize and standardize all code
> > examples in PR101179.
> >
> > Bootstrapped on x86, aarch64 and rv64.
> > Regression tested on x86 and aarch64.
> >
> >       PR tree-optimization/101179
> >
> > gcc/ChangeLog:
> >
> >       * match.pd(`A OP ((CST1 shift (zero_one * CST2)))`): New
> >       pattern.
> >       (`A OP ((CST1 shift (zero_one*CST2)) + CST3)`): New pattern.
> >       (`A OP ((zero_one shift CST1) + CST2)`): New pattern.
> >       (`A OP (CST2 - (zero_one lshift CST1))`): New pattern.
> >       (`if cond (cmp (OP A B) C) (cmp (OP A D) C)`): New pattern.
> >       (`A % (cond X pow2A pow2B) EQ|NE 0`)`: New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/tree-ssa/pr101179-2.c: New test.
> >       * gcc.dg/tree-ssa/pr101179-3.c: New test.
> >       * gcc.dg/tree-ssa/pr101179.c: New test.
> > ---
> >   gcc/match.pd                               | 133 ++++++++++++++++++++-
> >   gcc/testsuite/gcc.dg/tree-ssa/pr101179-2.c |  36 ++++++
> >   gcc/testsuite/gcc.dg/tree-ssa/pr101179-3.c |  11 ++
> >   gcc/testsuite/gcc.dg/tree-ssa/pr101179.c   |  95 +++++++++++++++
> >   4 files changed, 274 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101179-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101179-3.c
> >   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr101179.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 7f16fd4e081..885e4413e08 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -8642,6 +8642,137 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >      replace if (x == 0) with tem = ~x; if (tem != 0) which is
> >      clearly less optimal and which we'll transform again in forwprop.  */
> >
> > +/* PR101179: decompose the following zero_one patterns than creates
> > +   two CSTs into a cond based format.  E.g:
> > +
> > +   A OP (4 << (zero_one * 2)) ->  A OP (zero_one ? 16 : 4)
> > +
> > +   This will give optimizations like PR122608 the opportunity to
> > +   turn this further into:
> > +
> > +   "zero_one ? A OP CST1 : A OP CST2".  */
> > +(for op (plus minus mult bit_and bit_ior bit_xor lshift rshift
> > +      rdiv trunc_div ceil_div floor_div round_div exact_div
> > +      trunc_mod ceil_mod floor_mod round_mod)
> > + (for shift (lshift rshift)
> > +
> > +  /* A OP ((CST1 shift (zero_one * CST2))) ->
> > +     A OP (zero_one ? CST1 shift CST2 : CST1) */
> > +  (simplify
> > +   (op @0 (shift INTEGER_CST@1
> > +             (mult:c (convert? zero_one_valued_p@2) INTEGER_CST@3)))
> > +   (op @0 (cond @2 (shift @1 @3) @1)))
> > +  (simplify
> > +   (op (shift INTEGER_CST@1
> > +             (mult:c (convert? zero_one_valued_p@2) INTEGER_CST@3))
> > +     @0)
> > +   (op (cond @2 (shift @1 @3) @1) @0))
> > +
> > +  /* A OP ((CST1 shift (zero_one*CST2)) + CST3) ->
> > +     A OP (zero_one ? (CST1 shift CST2) + CST3 : CST1 + CST3) */
> > +  (simplify
> > +   (op @0 (plus:c
> > +        (shift INTEGER_CST@1
> > +                (mult:c (convert? zero_one_valued_p@2) INTEGER_CST@3))
> > +        INTEGER_CST@4))
> > +   (op @0 (cond @2 (plus (shift @1 @3) @4) (plus @1 @4))))
> > +  (simplify
> > +   (op (plus:c
> > +      (shift INTEGER_CST@1
> > +             (mult:c (convert? zero_one_valued_p@2) INTEGER_CST@3))
> > +      INTEGER_CST@4)
> > +       @0)
> > +   (op (cond @2 (plus (shift @1 @3) @4) (plus @1 @4)) @0))
> > +
> > +  /* A OP ((zero_one shift CST1) + CST2) ->
> > +     A OP (zero_one ? (1 shift CST1) + CST2 : CST2) */
> > +  (simplify
> > +   (op @0 (plus:c (shift zero_one_valued_p@2 INTEGER_CST@1) INTEGER_CST@3))
> > +   (op @0 (cond @2 (plus (shift { build_one_cst (type); } @1) @3)
> > +                 @3)))
> > +  (simplify
> > +   (op (plus:c (shift zero_one_valued_p@2 INTEGER_CST@1) INTEGER_CST@3) @0)
> > +   (op (cond @2 (plus (shift { build_one_cst (type); } @1) @3) @3)
> > +       @0))
> > +
> > +  /* A OP (CST2 - (zero_one lshift CST1)) ->
> > +     A OP (zero_one ? CST2 - (1 lshift CST1) : CST2) */
> > +  (simplify
> > +   (op @0 (minus INTEGER_CST@3
> > +             (lshift zero_one_valued_p@2 INTEGER_CST@1)))
> > +   (op @0 (cond @2 (minus @3 (lshift { build_one_cst (type); } @1)) @3)))
> > +  (simplify
> > +   (op (minus INTEGER_CST@3
> > +             (lshift zero_one_valued_p@2 INTEGER_CST@1)) @0)
> > +   (op (cond @2 (minus @3 (lshift { build_one_cst (type); } @1)) @3) @0))
> > +))
> > +
> > +/* PR101179: if cond (cmp (OP A B) C) (cmp (OP A D) C)
> > +
> > +   The idea here is to detect patterns like this C code:
> > +
> > +   (x ? (y%16) == 0 : (y%4) == 0)
> > +
> > +   That will generate phiopt dumps such as:
> > +
> > +   bb3:
> > +   _14 = y_9 & 15;
> > +   _3 = _14 == 0;
> > +   goto <bb 5>;
> > +
> > +   bb4:
> > +   _13 = y_9 & 3;
> > +   _6 = _13 == 0;
> > +
> > +   bb5:
> > +   # _5 = PHI <_6 (4), _3 (3)>
> > +   _7 = (intD.7) _5;
> > +
> > +   We want to turn that into a pattern more friendly to other
> > +   optimizations (like PR122608) by moving the comparison to
> > +   the merge block:
> > +
> > +      bb3:
> > +   _14 = y_9 & 15;
> > +   goto <bb 5>;
> > +
> > +   bb4:
> > +   _13 = y_9 & 3;
> > +
> > +   bb5:
> > +   # _5 = PHI <_6 (4), _3 (3)>
> > +   _6 = _5 == 0;
> > +   _7 = (intD.7) _5;
> > +
> > +   Note that several opcodes (plus/minus, div, min/max) do
> > +   not generate this pattern due to their own specific
> > +   comparison optimizations.  */
> > +(for op (mult bit_and bit_ior bit_xor lshift rshift
> > +      trunc_mod ceil_mod floor_mod round_mod)
> > + (for cmp (simple_comparison)
> > +  (simplify
> > +   (cond @0 (cmp (op @1 @2) @4) (cmp (op @1 @3) @4))
> > +   (cmp (op @1 (cond @0 @2 @3)) @4)))
> > +)
> > +
> > +/* PR101179: A % (cond X pow2A pow2B) EQ|NE 0 ->
> > +          A & (cond X pow2A-1 pow2B-1) EQ|NE 0
> > +
> > +   Turning a comparison between a mod with a pow2 operand and zero
> > +   into a bit_and is usually done by the frontend.  However the
> > +   frontend (from C, at least) is not able to do this conversion
> > +   if the mod is using the result of a cond with two pow2 operands.  */
> > +(for op (trunc_mod ceil_mod floor_mod round_mod)
> > + (for cmp (eq ne)
> > +  (simplify
> > +   (cmp (op @0 (cond@4 @3 integer_pow2p@1 integer_pow2p@2)) 
> > integer_zerop@5)
> > +    (if (INTEGRAL_TYPE_P (type)
> > +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +      && tree_int_cst_sgn (@1) > 0
> > +      && tree_int_cst_sgn (@2) > 0)
> > +     (cmp (bit_and @0 (minus @4 { build_one_cst (TREE_TYPE (@4)); }))
> > +                   { build_zero_cst (TREE_TYPE (@0)); })))))
> > +
> >   #if GIMPLE
> >   /* From fold_binary_op_with_conditional_arg handle the case of
> >      rewriting (a ? b : c) > d to a ? (b > d) : (c > d) when the
> > @@ -8658,7 +8789,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >          || !operation_could_trap_p (cmp, true, false, @3))
> >      (cond @0 (cmp!:type @1 @3) (cmp!:type @2 @3)))))
> >
> > -/* Similar to above:
> > +/* PR 122608 - Similar to above:
> >      (c ? a : b) op d  ->  c ? (a op d) : (b op d)
> >      But with non-vector binary ops, most of them non-commutative.  */
> >   (for op (plus minus mult bit_and bit_ior bit_xor
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101179-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr101179-2.c
> > new file mode 100644
> > index 00000000000..f8e9be2d404
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101179-2.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-phiopt1" } */
> > +
> > +/* Tests for the pattern
> > +   if cond X (cmp (OP A B) C) (cmp (OP A D) C) ->
> > +   (cmp (op A (cond X B D) C))).  */
> > +
> > +#define F(OP,NAME) \
> > + int NAME##_test (int y, _Bool x) { \
> > +   return x ? (y OP 16) == 0 : (y OP 4) == 0; \
> > +}
> > +
> > +F (%, mod)
> > +F (<<, lshift)
> > +F (>>, rshift)
> > +
> > +int bit_and_test (int y, _Bool x) {
> > +   return x ? (y & 63) != 1 : (y & 16) != 1;
> > +}
> > +
> > +int bit_ior_test (int y, _Bool x) {
> > +   return x ? (y | 3) < 0xF : (y | 8) < 0XF;
> > +}
> > +
> > +int bit_xor_test (int y, _Bool x) {
> > +  return x ? (y ^ 3) <= 13 : (y ^ 8) <= 13;
> > +}
> > +
> > +int mult_test (int y, _Bool x) {
> > +   return x ? (y * 3) > 0xF : (y * 2) > 0XF;
> > +}
> > +/* { dg-final { scan-tree-dump-times " == 0" 3 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " != 1" 1 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " <= 14" 1 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " <= 13" 1 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " > 15" 1 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101179-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr101179-3.c
> > new file mode 100644
> > index 00000000000..9ca0a6a03e4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101179-3.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-phiopt1" } */
> > +
> > +int test (int y, _Bool x) {
> > +  return y % (x ? 16 : 4) == 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times " PHI <15" 1 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " PHI <16" 0 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " & " 1 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times " % " 0 "phiopt1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101179.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr101179.c
> > new file mode 100644
> > index 00000000000..f448e7a73fa
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101179.c
> > @@ -0,0 +1,95 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-phiopt1" } */
> > +
> > +#define F(OP,NAME) \
> > + int NAME##_test (int y, _Bool x) { \
> > +   return y OP (4 << (x * 3)); \
> > +}
> > +
> > +#define F2(OP,NAME) \
> > + int NAME##_test2 (int y, _Bool x) { \
> > +   return y OP ((4 << (x * 3)) + 1); \
> > +}
> > +
> > +#define F3(OP,NAME) \
> > + int NAME##_test3 (int y, _Bool x) { \
> > +   return y OP (32 >> (x * 3)); \
> > +}
> > +
> > +#define F4(OP,NAME) \
> > + int NAME##_test4 (int y, _Bool x) { \
> > +   return y OP ((32 >> (x * 3)) + 1); \
> > +}
> > +
> > +#define F5(OP,NAME) \
> > + int NAME##_test5 (int y, _Bool x) { \
> > +   return y OP ((x << 2) + 3); \
> > +}
> > +
> > +#define F6(OP,NAME) \
> > + int NAME##_test6 (int y, _Bool x) { \
> > +   return y OP (7 - (x << 2)); \
> > +}
> > +
> > +/* Same patterns from above but switching operands */
> > +#define F_alt(OP,NAME) \
> > + int NAME##_test_alt (int y, _Bool x) { \
> > +   return (4 << (x * 3)) OP y; \
> > +}
> > +
> > +#define F_alt2(OP,NAME) \
> > + int NAME##_test_alt2 (int y, _Bool x) { \
> > +   return ((4 << (x * 3)) + 1) OP y; \
> > +}
> > +
> > +#define F_alt3(OP,NAME) \
> > + int NAME##_test_alt3 (int y, _Bool x) { \
> > +   return (32 >> (x * 3)) OP y; \
> > +}
> > +
> > +#define F_alt4(OP,NAME) \
> > + int NAME##_test_alt4 (int y, _Bool x) { \
> > +   return ((32 >> (x * 3)) + 1) OP y; \
> > +}
> > +
> > +#define F_alt5(OP,NAME) \
> > + int NAME##_test_alt5 (int y, _Bool x) { \
> > +   return ((x << 2) + 3) OP y; \
> > +}
> > +
> > +#define F_alt6(OP,NAME) \
> > + int NAME##_test_alt6 (int y, _Bool x) { \
> > +   return (7 - (x << 2)) OP y; \
> > +}
> > +
> > +#define T(OP,NAME) \
> > +  F (OP, NAME) \
> > +  F2 (OP, NAME) \
> > +  F3 (OP, NAME) \
> > +  F4 (OP, NAME) \
> > +  F5 (OP, NAME) \
> > +  F6 (OP, NAME) \
> > +  F_alt (OP, NAME) \
> > +  F_alt2 (OP, NAME) \
> > +  F_alt3 (OP, NAME) \
> > +  F_alt4 (OP, NAME) \
> > +  F_alt5 (OP, NAME) \
> > +  F_alt6 (OP, NAME)
> > +
> > +T (+, plus)
> > +T (-, minus)
> > +T (*, mult)
> > +T (|, bit_and)
> > +T (|, bit_ior)
> > +T (^, bit_xor)
> > +T (/, div)
> > +T (%, mod)
> > +T (<<, lshift)
> > +T (>>, rshift)
> > +
> > +/* { dg-final { scan-tree-dump-times "PHI <32" 20 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "PHI <33" 20 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "PHI <4" 20 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "PHI <5" 20 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "PHI <7" 20 "phiopt1" } } */
> > +/* { dg-final { scan-tree-dump-times "PHI <3\\(" 20 "phiopt1" } } */
>

Reply via email to