I got too clever trying to simplify the right shift computation in my recent ifcvt patch. Interestingly enough, I haven't seen anything but the Linaro CI configuration actually trip the problem, though the code is clearly wrong.

The problem I was trying to avoid were the leading zeros when calling clz on a HWI when the real object is just say 32 bits.

The net is we get a right shift count of "2" when we really wanted a right shift count of 30. That causes the execution aspect of bics_3 to fail.

The scan failures are due to creating slightly more efficient code. THe new code sequences don't need to use conditional execution for selection and thus we can use bic rather bics which requires a twiddle in the scan.

I reviewed recent bug reports and haven't seen one for this issue. So no new testcase as this is covered by the armv7 testsuite in the right configuration.

Bootstrapped and regression tested on x86_64, also verified it fixes the Linaro reported CI failure and verified the crosses are still happy. Pushing to the trunk.

Jeff
commit 56ca14c4c4fa2279ff80b44b495d375c42795598
Author: Jeff Law <j...@ventanamicro.com>
Date:   Sun Aug 24 19:55:44 2025 -0600

    Fix invalid right shift count with recent ifcvt changes
    
    I got too clever trying to simplify the right shift computation in my recent
    ifcvt patch.  Interestingly enough, I haven't seen anything but the Linaro 
CI
    configuration actually trip the problem, though the code is clearly wrong.
    
    The problem I was trying to avoid were the leading zeros when calling clz 
on a
    HWI when the real object is just say 32 bits.
    
    The net is we get a right shift count of "2" when we really wanted a right
    shift count of 30.  That causes the execution aspect of bics_3 to fail.
    
    The scan failures are due to creating slightly more efficient code. THe new
    code sequences don't need to use conditional execution for selection and 
thus
    we can use bic rather bics which requires a twiddle in the scan.
    
    I reviewed recent bug reports and haven't seen one for this issue.  So no 
new
    testcase as this is covered by the armv7 testsuite in the right 
configuration.
    
    Bootstrapped and regression tested on x86_64, also verified it fixes the 
Linaro
    reported CI failure and verified the crosses are still happy.  Pushing to 
the
    trunk.
    
    gcc/
            * ifcvt.cc (noce_try_sign_bit_splat): Fix right shift computation.
    
    gcc/testsuite/
            * gcc.target/arm/bics_3.c: Adjust expected output

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 1181b44ff3e..4579148750d 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -1283,13 +1283,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
         one with smaller cost.  */
       rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_a));
       rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_a)));
-      rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi 
(~val_a)));
+      HOST_WIDE_INT rshift_count
+       = (clz_hwi (val_a) & (GET_MODE_PRECISION (mode).to_constant() - 1));
+      rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
       bool speed_p = optimize_insn_for_speed_p ();
       if (exact_log2 (val_a + 1) >= 0
          && (rtx_cost (shift_right, mode, SET, 1, speed_p)
              <= rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, LSHIFTRT, temp,
-                                   GEN_INT (ctz_hwi (~val_a)),
+                                   GEN_INT (rshift_count),
                                    if_info->x, false, OPTAB_WIDEN);
       else if (exact_log2 (~val_a + 1) >= 0
               && (rtx_cost (shift_left, mode, SET, 1, speed_p)
@@ -1333,13 +1335,15 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
         one with smaller cost.  */
       rtx and_form = gen_rtx_AND (mode, temp, GEN_INT (val_b));
       rtx shift_left = gen_rtx_ASHIFT (mode, temp, GEN_INT (ctz_hwi (val_b)));
-      rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (ctz_hwi 
(~val_b)));
+      HOST_WIDE_INT rshift_count
+       = (clz_hwi (val_b) & (GET_MODE_PRECISION (mode).to_constant() - 1));
+      rtx shift_right = gen_rtx_LSHIFTRT (mode, temp, GEN_INT (rshift_count));
       bool speed_p = optimize_insn_for_speed_p ();
       if (exact_log2 (val_b + 1) >= 0
          && (rtx_cost (shift_right, mode, SET, 1, speed_p)
              <= rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, LSHIFTRT, temp,
-                                   GEN_INT (ctz_hwi (~val_b)),
+                                   GEN_INT (rshift_count),
                                    if_info->x, false, OPTAB_WIDEN);
       else if (exact_log2 (~val_b + 1) >= 0
               && (rtx_cost (shift_left, mode, SET, 1, speed_p)
diff --git a/gcc/testsuite/gcc.target/arm/bics_3.c 
b/gcc/testsuite/gcc.target/arm/bics_3.c
index 4d6938948a1..36b0b277d3f 100644
--- a/gcc/testsuite/gcc.target/arm/bics_3.c
+++ b/gcc/testsuite/gcc.target/arm/bics_3.c
@@ -33,5 +33,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 
} } */
-/* { dg-final { scan-assembler-times "bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
.sl #2" 1 } } */
+/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+" 2 
} } */
+/* { dg-final { scan-assembler-times "bic\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, 
.sl #2" 1 } } */

Reply via email to