On 3/9/2026 12:39 PM, Daniel Henrique Barboza wrote:
> From: Daniel Barboza <[email protected]>
>
> Identify cases where a zero_one comparison is used to conditional
> constant assignment and turn that into an unconditional PLUS.  For the
> code in PR71336:
>
> int test(int a) {
>      return a & 1 ? 7 : 3;
> }
>
> We'll turn that into "(a&1) * (7 - 3) + 3", which yields the same
> results but without the conditional, promoving more optimization
> opportunities.  In an armv8-a target the original code generates:
>
> tst     x0, 1   // 38   [c=8 l=4]  *anddi3nr_compare0_zextract
> mov     w1, 3   // 41   [c=4 l=4]  *movsi_aarch64/3
> mov     w0, 7   // 42   [c=4 l=4]  *movsi_aarch64/3
> csel    w0, w1, w0, eq  // 17   [c=4 l=4]  *cmovsi_insn/0
> ret             // 47   [c=0 l=4]  *do_return
>
> With this transformation:
>
> ubfiz   w0, w0, 2, 1    // 7    [c=4 l=4]  *andim_ashiftsi_bfiz
> add     w0, w0, 3       // 13   [c=4 l=4]  *addsi3_aarch64/0
> ret             // 21   [c=0 l=4]  *do_return
>
> Similar gains are noticeable in RISC-V and x86.
>
> For completeness sake we're also adding the variant "zero_one == 0".
> Both transformations check for type <= word_size to avoid introducing a
> wide integer multiplication that the target will have trouble dealing
> with.
>
> Bootstrapped and regression tested in x86 and aarch64.
>
>       PR tree-optimization/71336
>
> gcc/ChangeLog:
>
>       * match.pd(`zero_one EQ|NE 0 ? CST1:CST2`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.dg/tree-ssa/pr71336-2.c: New test.
>       * gcc.dg/tree-ssa/pr71336.c: New test.
> ---
>
> Changes from v1:
> - add type <= word_size check to avoid a wide int multiplication, as
>    suggested by Richard
> - v1 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/710125.html
>
>   gcc/match.pd                              | 38 +++++++++++++++
>   gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c | 59 +++++++++++++++++++++++
>   gcc/testsuite/gcc.dg/tree-ssa/pr71336.c   | 20 ++++++++
>   3 files changed, 117 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336-2.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71336.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7f16fd4e081..590575ea2e0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5195,6 +5195,44 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>          && expr_no_side_effects_p (@2))
>          (op (mult (convert:type @0) @2) @1))))
>   
> +/* PR71336:
> +   zero_one != 0 ? CST1 : CST2 -> ((typeof (CST2))zero_one * diff) + CST2,
> +   where CST1 > CST2 and diff = CST1 - CST2.
> +
> +   Includes the "zero_one == 0 ? (...)" variant too.  */
> +(for cmp (ne eq)
> + (simplify
> +  (cond (cmp zero_one_valued_p@0 integer_zerop) INTEGER_CST@1 INTEGER_CST@2)
> +  (with {
> +    unsigned HOST_WIDE_INT diff = 0;
> +
> +    if (tree_int_cst_sgn (@1) > 0 && tree_int_cst_sgn (@2) > 0
> +     && tree_fits_uhwi_p (@1) && tree_fits_uhwi_p (@2))
> +     {
> +     if (cmp == NE_EXPR
> +         && wi::gtu_p (wi::to_wide (@1), wi::to_wide (@2)))
> +       diff = tree_to_uhwi (@1) - tree_to_uhwi (@2);
> +
> +     if (cmp == EQ_EXPR
> +         && wi::gtu_p (wi::to_wide (@2), wi::to_wide (@1)))
> +       diff = tree_to_uhwi (@2) - tree_to_uhwi (@1);
> +     }
> +   }

> +   (if (cmp == NE_EXPR
> +     && INTEGRAL_TYPE_P (type)
> +     && TYPE_PRECISION (type) <= BITS_PER_WORD
> +     && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +     && diff > 0)
> +     (plus (mult (convert:type @0) { build_int_cst (type, diff); })
> +         @2)
> +    (if (cmp == EQ_EXPR
> +      && INTEGRAL_TYPE_P (type)
> +      && TYPE_PRECISION (type) <= BITS_PER_WORD
> +      && INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +      && diff > 0)
> +      (plus (mult (convert:type @0) { build_int_cst (type, diff); })
> +          @1))))))
Note the parallels in how EQ/NE get handled.  Could we meaningfully 
simplify the code by creating two new locals within with WITH holding @1 
and @2 initially, then conditionally swap them if necessary.  That 
should (in theory) allow some code de-duplication.

I don't know if it's been discussed, but do we want to limit to cases 
where the multiplication is 2^n and thus implementable via a shift?   Of 
course that then begs if we should handle *3, *5 and *9 specially too, 
but that's probably getting too close to catering to specific targets.  
Of course there's also BZs around revamping expansion to do something 
more sensible with these MULT sequences. Raphael's patch didn't work the 
way we wanted, but I think Andrew and I both think it shows a path 
forward to steering those MULT operations into conditional move 
expanders.  So, yea, maybe just leave this as-is in the expectation that 
we'll adjust the gimple->rtl interface to adjust how we generate code 
for 0/1 * C to conditionally select between 0 and C.

So barring other comments I think this is generally OK for gcc-17. If 
the cleanup is possible I'd like to see that, but if it doesn't clean up 
the code duplication, then obviously we could go with the change as-is.

Jeff

Reply via email to