On 5/9/2026 4:52 PM, Jeffrey Law wrote:
On 4/16/2026 6:29 AM, Daniel Henrique Barboza wrote:
From: Daniel Barboza <[email protected]>
Canonicalize right shift non-equality comparisons with constants by
turn them into a comparison with a left shifted constant. Assuming the
generic format:
(A >> CST1) CMP CST2
For CMP (<, >=) we'll compare A with CST2 left shifted by CST1:
- (A >> CST1) < CST2 -> A < (CST2 << CST1)
- (A >> CST1) >= CST2 -> A >= (CST2 << CST1)
And for CMP (<=, >) we need to IOR the lower CST1 bits from the left
shift:
- (A >> CST1) <= CST2 -> A <= (CST2 << CST1) | mask
- (A >> CST1) > CST2 -> A > (CST2 << CST1) | mask
Given that the right hand side changes involves just constants, in the
end we'll replace a rshift + cmp with just a cmp.
Bootstrapped and regression tested in x86, aarch64 and RISC-V.
PR tree-optimization/124808
gcc/ChangeLog:
* match.pd(`(A >> CST1) CMP CST2`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr124808-2.c: New test.
* gcc.dg/tree-ssa/pr124808.c: New test.
---
Changes from v2:
- added "#if GIMPLE" and "single_use" conditions in the pattern
- changed pr124808-2.c to scan 'forwprop1' dump instead of 'gimple'
- reverted all changes made in sat-1.c
- reverted all changes made in pr104479.c
- v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/713005.html
gcc/match.pd | 58 ++++++++++++++++
gcc/testsuite/gcc.dg/tree-ssa/pr124808-2.c | 44 ++++++++++++
gcc/testsuite/gcc.dg/tree-ssa/pr124808.c | 78 ++++++++++++++++++++++
3 files changed, 180 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124808-2.c
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr124808.c
diff --git a/gcc/match.pd b/gcc/match.pd
index 7b652afb43d..8eece2223f9 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5367,6 +5367,64 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
@0)))))
#endif
+#if GIMPLE
+/* PR124808: (A >> CST1) CMP CST2 -> A CMP (CST2 << CST1)
+ Canonicalize non-equality comparisons between a right
+ shift and a constant, turning it into a comparison
+ with a constant that is left shifted. If we view A as:
+
+ A: "|--- CST2 ----|--CST1--|"
+
+ A >> CST1 will be equal to CST2 for all A values in the
+ range CST2 << CST1 to (CST2 << CST1) | 1s_mask (CST1).
+
+ Therefore:
+ - (A >> CST1) < CST2 -> A < (CST2 << CST1), A must
+ be smaller than all values from the range;
+ - (A >> CST1) <= CST2 -> A <= (CST2 << CST1) | mask,
+ A must be smaller or equal than the range end;
+ - (A >> CST1) > CST2 -> A > (CST2 << CST1) | mask,
+ A must be greater than all values from the range;
+ - (A >> CST1) >= CST2 -> A >= (CST2 << CST1), A must
+ be greater or equal than the range start.
+
+ We're also using "single_use" and wrapping around "if GIMPLE"
+ because (1) this class "VA1 LSHIFT/RSHIFT VAL2 CMP VAL3" of
+ optimizations tend to match CTZ|CLZ builtin patterns and we
+ don't want to trip on them and (2) we will get in the way of
+ certain optimizations (see ARM's sat-1.c test) that are done
+ using GENERIC due to how forwprop currently works. */
+(for cmp (le lt ge gt)
+ (simplify
+ (cmp (rshift@3 @0 INTEGER_CST@1) INTEGER_CST@2)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && single_use (@3)
+ && tree_fits_uhwi_p (@1)
+ && tree_to_uhwi (@1) < TYPE_PRECISION (TREE_TYPE (@0))
+ && tree_int_cst_sgn (@2) >= 0
+ /* If @2 is nonzero check if we'll overflow when doing
+ @2 << @1 by doing a wi::lshift and checking if the
+ result is zero (i.e. overflow). */
+ && (wi::to_wide (@2) == 0
+ || !wi::eq_p (wi::lshift (wi::to_wide (@2), wi::to_wide (@1)),
+ wi::zero (TYPE_PRECISION (TREE_TYPE (@2))))))
So you're using the constant left shifted to see if there's an overflow. But
I'm not 100% sure that's sufficient here. Don't you also need to check that
the sign bit doesn't change? What about the case where there's overflow, but
the constant doesn't go to zero (ie, some bits get shifted out, but not all of
them?)
My idea here was to use wi::lshift() and check for a 0 result, which
I thought it would indicate overflows. I was wrong: the docs says that
it can return 0 if the shift amount is greater or equal than the precision
of the first operand. For some reason I thought it would cover overflows
and/or sign changes too.
I changed it to check that, if @2 is nonzero, if the shift amount will
push the topmost bit to the sign bit, or if we'll have any 1s being
shifted out causing an overflow if @2 is unsigned:
diff --git a/gcc/match.pd b/gcc/match.pd
index 8eece2223f9..11a1c00a7f5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5402,12 +5402,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& tree_fits_uhwi_p (@1)
&& tree_to_uhwi (@1) < TYPE_PRECISION (TREE_TYPE (@0))
&& tree_int_cst_sgn (@2) >= 0
- /* If @2 is nonzero check if we'll overflow when doing
- @2 << @1 by doing a wi::lshift and checking if the
- result is zero (i.e. overflow). */
+ && tree_to_uhwi (@1) < TYPE_PRECISION (TREE_TYPE (@2))
&& (wi::to_wide (@2) == 0
- || !wi::eq_p (wi::lshift (wi::to_wide (@2), wi::to_wide (@1)),
- wi::zero (TYPE_PRECISION (TREE_TYPE (@2))))))
+ /* If @2 is nonzero, verify if we'll overflow when
+ doing @2 << @1 by checking if the shift amount
+ can start throwing 1s away if type_unsigned, or
+ if the shift amount can reach the sign bit. */
+ || (TYPE_UNSIGNED (TREE_TYPE (@2))
+ && wi::leu_p (wi::to_wide (@1), wi::clz (wi::to_wide (@2))))
+ || (!TYPE_UNSIGNED (TREE_TYPE (@2))
+ && wi::ltu_p (wi::to_wide (@1), wi::clz (wi::to_wide (@2))))))
Hopefully this covers all the possible cases where we might goof up
with this simplification.
+
+ /* No need to set the lower @1 bits of the resulting
+ lshift for "<" and ">=" comparisons. */
+ (if (cmp == LT_EXPR || cmp == GE_EXPR)
+ (cmp @0 (lshift @2 @1))
+
+ /* For "<=" and ">" set the lower @1 lshift bits. */
+ (with {
+ tree type2 = TREE_TYPE (@2);
+ unsigned prec = TYPE_PRECISION (type2);
+ unsigned mask_len = TREE_INT_CST_LOW (@1);
Probably better to be working with unsigned HOST_WIDE_INTs here. And you can
use tree_to_uhwi to extract the value rather than dropping to the lower level
TREE_INT_CST_LOW. I often make this mistake myself.
Makes sense. Just changed to use tree_to_uhwi in 'mask_len'.
I'll do the usual bootstrap/regression test and post a v4. Thanks,
Daniel
Jeff