On 3/9/2026 12:39 PM, Daniel Henrique Barboza wrote:
> From: Daniel Barboza <[email protected]>
>
> Identify cases where a zero_one comparison is used to conditional
> constant assignment and turn that into an unconditional PLUS. For the
> code in PR71336:
>
> int test(int a) {
> return a & 1 ? 7 : 3;
> }
>
> We'll turn that into "(a&1) * (7 - 3) + 3", which yields the same
> results but without the conditional, promoving more optimization
> opportunities. In an armv8-a target the original code generates:
>
> tst x0, 1 // 38 [c=8 l=4] *anddi3nr_compare0_zextract
> mov w1, 3 // 41 [c=4 l=4] *movsi_aarch64/3
> mov w0, 7 // 42 [c=4 l=4] *movsi_aarch64/3
> csel w0, w1, w0, eq // 17 [c=4 l=4] *cmovsi_insn/0
> ret // 47 [c=0 l=4] *do_return
>
> With this transformation:
>
> ubfiz w0, w0, 2, 1 // 7 [c=4 l=4] *andim_ashiftsi_bfiz
> add w0, w0, 3 // 13 [c=4 l=4] *addsi3_aarch64/0
> ret // 21 [c=0 l=4] *do_return
>
> Similar gains are noticeable in RISC-V and x86.
>
> For completeness sake we're also adding the variant "zero_one == 0".
> Both transformations check for type <= word_size to avoid introducing a
> wide integer multiplication that the target will have trouble dealing
> with.
>
> Bootstrapped and regression tested in x86 and aarch64.
>
> PR tree-optimization/71336
>
> gcc/ChangeLog:
>
> * match.pd(`zero_one EQ|NE 0 ? CST1:CST2`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr71336-2.c: New test.
> * gcc.dg/tree-ssa/pr71336.c: New test.
> ---
>
> Changes from v1:
> - add type <= word_size check to avoid a wide int multiplication, as
> suggested by Richard
> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/710125.html
>
> gcc/match.pd | 38 +++++++++++++++
> gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c | 59 +++++++++++++++++++++++
> gcc/testsuite/gcc.dg/tree-ssa/pr71336.c | 20 ++++++++
> 3 files changed, 117 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..590575ea2e0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5195,6 +5195,44 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> && expr_no_side_effects_p (@2))
> (op (mult (convert:type @0) @2) @1))))
>
> +/* PR71336:
> + zero_one != 0 ? CST1 : CST2 -> ((typeof (CST2))zero_one * diff) + CST2,
> + where CST1 > CST2 and diff = CST1 - CST2.
> +
> + Includes the "zero_one == 0 ? (...)" variant too. */
> +(for cmp (ne eq)
> + (simplify
> + (cond (cmp zero_one_valued_p@0 integer_zerop) INTEGER_CST@1 INTEGER_CST@2)
> + (with {
> + unsigned HOST_WIDE_INT diff = 0;
> +
> + if (tree_int_cst_sgn (@1) > 0 && tree_int_cst_sgn (@2) > 0
> + && tree_fits_uhwi_p (@1) && tree_fits_uhwi_p (@2))
> + {
> + if (cmp == NE_EXPR
> + && wi::gtu_p (wi::to_wide (@1), wi::to_wide (@2)))
> + diff = tree_to_uhwi (@1) - tree_to_uhwi (@2);
> +
> + if (cmp == EQ_EXPR
> + && wi::gtu_p (wi::to_wide (@2), wi::to_wide (@1)))
> + diff = tree_to_uhwi (@2) - tree_to_uhwi (@1);
> + }
> + }
> + (if (cmp == NE_EXPR
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) <= BITS_PER_WORD
> + && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && diff > 0)
> + (plus (mult (convert:type @0) { build_int_cst (type, diff); })
> + @2)
> + (if (cmp == EQ_EXPR
> + && INTEGRAL_TYPE_P (type)
> + && TYPE_PRECISION (type) <= BITS_PER_WORD
> + && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> + && diff > 0)
> + (plus (mult (convert:type @0) { build_int_cst (type, diff); })
> + @1))))))
Note the parallels in how EQ/NE get handled. Could we meaningfully
simplify the code by creating two new locals within with WITH holding @1
and @2 initially, then conditionally swap them if necessary. That
should (in theory) allow some code de-duplication.
I don't know if it's been discussed, but do we want to limit to cases
where the multiplication is 2^n and thus implementable via a shift? Of
course that then begs if we should handle *3, *5 and *9 specially too,
but that's probably getting too close to catering to specific targets.
Of course there's also BZs around revamping expansion to do something
more sensible with these MULT sequences. Raphael's patch didn't work the
way we wanted, but I think Andrew and I both think it shows a path
forward to steering those MULT operations into conditional move
expanders. So, yea, maybe just leave this as-is in the expectation that
we'll adjust the gimple->rtl interface to adjust how we generate code
for 0/1 * C to conditionally select between 0 and C.
So barring other comments I think this is generally OK for gcc-17. If
the cleanup is possible I'd like to see that, but if it doesn't clean up
the code duplication, then obviously we could go with the change as-is.
Jeff