I got too clever trying to simplify the right shift computation in my
recent ifcvt patch. Interestingly enough, I haven't seen anything but
the Linaro CI configuration actually trip the problem, though the code
is clearly wrong.
The problem I was trying to avoid were the leading zeros when calling
clz on a HWI when the real object is just say 32 bits.
The net is we get a right shift count of "2" when we really wanted a
right shift count of 30. That causes the execution aspect of bics_3 to
fail.
The scan failures are due to creating slightly more efficient code. THe
new code sequences don't need to use conditional execution for selection
and thus we can use bic rather bics which requires a twiddle in the scan.
I reviewed recent bug reports and haven't seen one for this issue. So
no new testcase as this is covered by the armv7 testsuite in the right
configuration.
Bootstrapped and regression tested on x86_64, also verified it fixes the
Linaro reported CI failure and verified the crosses are still happy.
Pushing to the trunk.
Jeff
commit 56ca14c4c4fa2279ff80b44b495d375c42795598
Author: Jeff Law <j...@ventanamicro.com>
Date: Sun Aug 24 19:55:44 2025 -0600
Fix invalid right shift count with recent ifcvt changes
I got too clever trying to simplify the right shift computation in my recent
ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro
CI
configuration actually trip the problem, though the code is clearly wrong.
The problem I was trying to avoid were the leading zeros when calling clz
on a
HWI when the real object is just say 32 bits.
The net is we get a right shift count of "2" when we really wanted a right
shift count of 30. That causes the execution aspect of bics_3 to fail.
The scan failures are due to creating slightly more efficient code. THe new
code sequences don't need to use conditional execution for selection and
thus
we can use bic rather bics which requires a twiddle in the scan.
I reviewed recent bug reports and haven't seen one for this issue. So no
new
testcase as this is covered by the armv7 testsuite in the right
configuration.
Bootstrapped and regression tested on x86_64, also verified it fixes the
Linaro
reported CI failure and verified the crosses are still happy. Pushing to
the
trunk.
gcc/
* ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation.
gcc/testsuite/
* gcc.target/arm/bics_3.c: Adjust expected output
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 1181b44ff3e..4579148750d 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -1283,13 +1283,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
one with smaller cost. */
rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_a));
rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_a)));
- rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi
(~val_a)));
+ HOST_WIDE_INT rshift_count
+ = (clz_hwi (val_a) & (GET_MODE_PRECISION (mode).to_constant() - 1));
+ rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
bool speed_p = optimize_insn_for_speed_p ();
if (exact_log2 (val_a + 1) >= 0
&& (rtx_cost (shift_right, mode, SET, 1, speed_p)
<= rtx_cost (and_form, mode, SET, 1, speed_p)))
temp = expand_simple_binop (mode, LSHIFTRT, temp,
- GEN_INT (ctz_hwi (~val_a)),
+ GEN_INT (rshift_count),
if_info->x, false, OPTAB_WIDEN);
else if (exact_log2 (~val_a + 1) >= 0
&& (rtx_cost (shift_left, mode, SET, 1, speed_p)
@@ -1333,13 +1335,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
one with smaller cost. */
rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_b));
rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_b)));
- rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi
(~val_b)));
+ HOST_WIDE_INT rshift_count
+ = (clz_hwi (val_b) & (GET_MODE_PRECISION (mode).to_constant() - 1));
+ rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
bool speed_p = optimize_insn_for_speed_p ();
if (exact_log2 (val_b + 1) >= 0
&& (rtx_cost (shift_right, mode, SET, 1, speed_p)
<= rtx_cost (and_form, mode, SET, 1, speed_p)))
temp = expand_simple_binop (mode, LSHIFTRT, temp,
- GEN_INT (ctz_hwi (~val_b)),
+ GEN_INT (rshift_count),
if_info->x, false, OPTAB_WIDEN);
else if (exact_log2 (~val_b + 1) >= 0
&& (rtx_cost (shift_left, mode, SET, 1, speed_p)
diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c
b/gcc/testsuite/gcc.target/arm/bics_3.c
index 4d6938948a1..36b0b277d3f 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -33,5 +33,5 @@ main (void)
return 0;
}
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2
} } */
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+,
.sl #2" 1 } } */
+/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2
} } */
+/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+,
.sl #2" 1 } } */