Re: [PATCH v2] match.pd: zero_one != 0 ? CST1:CST2 -> (zero_one*diff)+CST2 [PR71336]

Daniel Henrique Barboza Wed, 18 Mar 2026 06:27:41 -0700



On 3/17/2026 5:10 PM, Jeffrey Law wrote:


On 3/9/2026 12:39 PM, Daniel Henrique Barboza wrote:

From: Daniel Barboza <[email protected]>

Identify cases where a zero_one comparison is used to conditional
constant assignment and turn that into an unconditional PLUS.  For the
code in PR71336:

int test(int a) {
      return a & 1 ? 7 : 3;
}

We'll turn that into "(a&1) * (7 - 3) + 3", which yields the same
results but without the conditional, promoving more optimization
opportunities.  In an armv8-a target the original code generates:

tst     x0, 1   // 38   [c=8 l=4]  *anddi3nr_compare0_zextract
mov     w1, 3   // 41   [c=4 l=4]  *movsi_aarch64/3
mov     w0, 7   // 42   [c=4 l=4]  *movsi_aarch64/3
csel    w0, w1, w0, eq  // 17   [c=4 l=4]  *cmovsi_insn/0
ret             // 47   [c=0 l=4]  *do_return

With this transformation:

ubfiz   w0, w0, 2, 1    // 7    [c=4 l=4]  *andim_ashiftsi_bfiz
add     w0, w0, 3       // 13   [c=4 l=4]  *addsi3_aarch64/0
ret             // 21   [c=0 l=4]  *do_return

Similar gains are noticeable in RISC-V and x86.

For completeness sake we're also adding the variant "zero_one == 0".
Both transformations check for type <= word_size to avoid introducing a
wide integer multiplication that the target will have trouble dealing
with.

Bootstrapped and regression tested in x86 and aarch64.

        PR tree-optimization/71336

gcc/ChangeLog:

        * match.pd(`zero_one EQ|NE 0 ? CST1:CST2`): New pattern.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr71336-2.c: New test.
        * gcc.dg/tree-ssa/pr71336.c: New test.
---

Changes from v1:
- add type <= word_size check to avoid a wide int multiplication, as
    suggested by Richard
- v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/710125.html

   gcc/match.pd                              | 38 +++++++++++++++
   gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c | 59 +++++++++++++++++++++++
   gcc/testsuite/gcc.dg/tree-ssa/pr71336.c   | 20 ++++++++
   3 files changed, 117 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 7f16fd4e081..590575ea2e0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5195,6 +5195,44 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
          && expr_no_side_effects_p (@2))
          (op (mult (convert:type @0) @2) @1))))

+/* PR71336:

+   zero_one != 0 ? CST1 : CST2 -> ((typeof (CST2))zero_one * diff) + CST2,
+   where CST1 > CST2 and diff = CST1 - CST2.
+
+   Includes the "zero_one == 0 ? (...)" variant too.  */
+(for cmp (ne eq)
+ (simplify
+  (cond (cmp zero_one_valued_p@0 integer_zerop) INTEGER_CST@1 INTEGER_CST@2)
+  (with {
+    unsigned HOST_WIDE_INT diff = 0;
+
+    if (tree_int_cst_sgn (@1) > 0 && tree_int_cst_sgn (@2) > 0
+       && tree_fits_uhwi_p (@1) && tree_fits_uhwi_p (@2))
+     {
+       if (cmp == NE_EXPR
+           && wi::gtu_p (wi::to_wide (@1), wi::to_wide (@2)))
+         diff = tree_to_uhwi (@1) - tree_to_uhwi (@2);
+
+       if (cmp == EQ_EXPR
+           && wi::gtu_p (wi::to_wide (@2), wi::to_wide (@1)))
+         diff = tree_to_uhwi (@2) - tree_to_uhwi (@1);
+     }
+   }

+   (if (cmp == NE_EXPR
+       && INTEGRAL_TYPE_P (type)
+       && TYPE_PRECISION (type) <= BITS_PER_WORD
+       && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+       && diff > 0)
+     (plus (mult (convert:type @0) { build_int_cst (type, diff); })
+           @2)
+    (if (cmp == EQ_EXPR
+        && INTEGRAL_TYPE_P (type)
+        && TYPE_PRECISION (type) <= BITS_PER_WORD
+        && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+        && diff > 0)
+      (plus (mult (convert:type @0) { build_int_cst (type, diff); })
+            @1))))))

Note the parallels in how EQ/NE get handled.  Could we meaningfully
simplify the code by creating two new locals within with WITH holding @1
and @2 initially, then conditionally swap them if necessary.  That
should (in theory) allow some code de-duplication.

I don't know if it's been discussed, but do we want to limit to cases
where the multiplication is 2^n and thus implementable via a shift?   Of
course that then begs if we should handle *3, *5 and *9 specially too,
but that's probably getting too close to catering to specific targets.
Of course there's also BZs around revamping expansion to do something
more sensible with these MULT sequences. Raphael's patch didn't work the
way we wanted, but I think Andrew and I both think it shows a path
forward to steering those MULT operations into conditional move
expanders.  So, yea, maybe just leave this as-is in the expectation that
we'll adjust the gimple->rtl interface to adjust how we generate code
for 0/1 * C to conditionally select between 0 and C.


So, about that .... I'm afraid I'll have to change the patterns being
generated here from 'mult' based to 'lshift' based.

The reason is that this work as is now is not playing nice with the work
I'm currently finishing in 101179, an optimization around "mod" ops.  I
have a patch to fix that optimization that requires a handful of extra
patterns, and one of the patterns need to address this code:

int f (int y) {
   int x = y % 100 == 0;
   return y & ((4 << (x * 2)) - 1);
}

The best solution I found is to recompose both CSTs back and let 122608
to do the rest.  122608 will do this:

(c ? a : b) op d  ->  c ? (a op d) : (b op d)

And this ends up being a good code for 101179.

Note that recreating CSTs is the opposite of what we're doing here in
71336.  Which is fine, as long as we could revert the CSTs back when
needed, i.e. 101179 needs to revert the "(zero_one * diff) + CST"
pattern introduced here.

Doing that in 101179 caused a lot of hassle in the regression tests
for both x86 and aarch64.  The pattern "(zero_one * diff) + CST"
introduced here happens to match WIDEN_MULT_PLUS_EXPR, an insn
sequence that both x86 and aarch64 have special insns for IIUC.  So
I can't safely revert what 71336 is doing now because I'll end up
getting in the way of desirable WIDEN_MULT_PLUS_EXPR patterns.

A solution would be changing this patch from "(zero_one * diff) + CST"
to "(zero_one << log2(diff) + CST)".  Both generates the same code
as far as 71336 goes.  It will limit the amount of cases we'll
be able to cover but it will coexist nicely with 101179.  In fact,
if everyone is ok with this change, I'll end up sending both
101179 and a new version of 71336 in the same series since
they're coupled together in the end.


All this said, if we're really invested in this mult based format
for this patch and want to preserve it, another solution to keep it
would be to make it work only when 122608 isn't applicable, e.g. we
need to check if the leftmost capture of the patch is a single_imm_use().
match.pd does not give access to the leftmost capture, so we would
need to either do a lot of work inside a (with) block (access the
associated gphi, check if its result is single_imm_use()) or give
up match.pd and do it in tree-ssa-phiopt or similar.  I didn't see any
other match.pd code that was going that far and didn't pursuit it,
and I'm a bit skeptical about how much effort we'll have versus the
actual gains involved in doing this outside of match.pd.  I'm all
ears though.


Thanks,
Daniel


So barring other comments I think this is generally OK for gcc-17. If
the cleanup is possible I'd like to see that, but if it doesn't clean up
the code duplication, then obviously we could go with the change as-is.

Jeff

Re: [PATCH v2] match.pd: zero_one != 0 ? CST1:CST2 -> (zero_one*diff)+CST2 [PR71336]

Reply via email to