On 4/16/2026 6:29 AM, Daniel Henrique Barboza wrote:
From: Daniel Barboza <[email protected]>
Canonicalize right shift non-equality comparisons with constants by
turn them into a comparison with a left shifted constant. Assuming the
generic format:
(A >> CST1) CMP CST2
For CMP (<, >=) we'll compare A with CST2 left shifted by CST1:
- (A >> CST1) < CST2 -> A < (CST2 << CST1)
- (A >> CST1) >= CST2 -> A >= (CST2 << CST1)
And for CMP (<=, >) we need to IOR the lower CST1 bits from the left
shift:
- (A >> CST1) <= CST2 -> A <= (CST2 << CST1) | mask
- (A >> CST1) > CST2 -> A > (CST2 << CST1) | mask
Given that the right hand side changes involves just constants, in the
end we'll replace a rshift + cmp with just a cmp.
Bootstrapped and regression tested in x86, aarch64 and RISC-V.
PR tree-optimization/124808
gcc/ChangeLog:
* match.pd(`(A >> CST1) CMP CST2`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr124808-2.c: New test.
* gcc.dg/tree-ssa/pr124808.c: New test.
---
Changes from v2:
- added "#if GIMPLE" and "single_use" conditions in the pattern
- changed pr124808-2.c to scan 'forwprop1' dump instead of 'gimple'
- reverted all changes made in sat-1.c
- reverted all changes made in pr104479.c
- v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713005.html
gcc/match.pd | 58 ++++++++++++++++
gcc/testsuite/gcc.dg/tree-ssa/pr124808-2.c | 44 ++++++++++++
gcc/testsuite/gcc.dg/tree-ssa/pr124808.c | 78 ++++++++++++++++++++++
3 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124808-2.c
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124808.c
diff --git a/gcc/match.pd b/gcc/match.pd
index 7b652afb43d..8eece2223f9 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5367,6 +5367,64 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@0)))))
#endif
+#if GIMPLE
+/* PR124808: (A >> CST1) CMP CST2 -> A CMP (CST2 << CST1)
+ Canonicalize non-equality comparisons between a right
+ shift and a constant, turning it into a comparison
+ with a constant that is left shifted. If we view A as:
+
+ A: "|--- CST2 ----|--CST1--|"
+
+ A >> CST1 will be equal to CST2 for all A values in the
+ range CST2 << CST1 to (CST2 << CST1) | 1s_mask (CST1).
+
+ Therefore:
+ - (A >> CST1) < CST2 -> A < (CST2 << CST1), A must
+ be smaller than all values from the range;
+ - (A >> CST1) <= CST2 -> A <= (CST2 << CST1) | mask,
+ A must be smaller or equal than the range end;
+ - (A >> CST1) > CST2 -> A > (CST2 << CST1) | mask,
+ A must be greater than all values from the range;
+ - (A >> CST1) >= CST2 -> A >= (CST2 << CST1), A must
+ be greater or equal than the range start.
+
+ We're also using "single_use" and wrapping around "if GIMPLE"
+ because (1) this class "VA1 LSHIFT/RSHIFT VAL2 CMP VAL3" of
+ optimizations tend to match CTZ|CLZ builtin patterns and we
+ don't want to trip on them and (2) we will get in the way of
+ certain optimizations (see ARM's sat-1.c test) that are done
+ using GENERIC due to how forwprop currently works. */
+(for cmp (le lt ge gt)
+ (simplify
+ (cmp (rshift@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && single_use (@3)
+ && tree_fits_uhwi_p (@1)
+ && tree_to_uhwi (@1) < TYPE_PRECISION (TREE_TYPE (@0))
+ && tree_int_cst_sgn (@2) >= 0
+ /* If @2 is nonzero check if we'll overflow when doing
+ @2 << @1 by doing a wi::lshift and checking if the
+ result is zero (i.e. overflow). */
+ && (wi::to_wide (@2) == 0
+ || !wi::eq_p (wi::lshift (wi::to_wide (@2), wi::to_wide (@1)),
+ wi::zero (TYPE_PRECISION (TREE_TYPE (@2))))))
So you're using the constant left shifted to see if there's an
overflow. But I'm not 100% sure that's sufficient here. Don't you also
need to check that the sign bit doesn't change? What about the case
where there's overflow, but the constant doesn't go to zero (ie, some
bits get shifted out, but not all of them?)
+
+ /* No need to set the lower @1 bits of the resulting
+ lshift for "<" and ">=" comparisons. */
+ (if (cmp == LT_EXPR || cmp == GE_EXPR)
+ (cmp @0 (lshift @2 @1))
+
+ /* For "<=" and ">" set the lower @1 lshift bits. */
+ (with {
+ tree type2 = TREE_TYPE (@2);
+ unsigned prec = TYPE_PRECISION (type2);
+ unsigned mask_len = TREE_INT_CST_LOW (@1);
Probably better to be working with unsigned HOST_WIDE_INTs here. And you
can use tree_to_uhwi to extract the value rather than dropping to the
lower level TREE_INT_CST_LOW. I often make this mistake myself.
Jeff