Hi,

Below is a patch for fixing PR121700 proposed for master. Kindly review. 
Regtesting on powerpc64le is still running. Will update the status soon in this
thread.

Thank you,
Avinash Jayakar

rs6000: Update scalar cost of {TRUNC,FLOOR}_MOD_EXPR[PR121700]

The default cost model with O2 flag is the VERY_CHEAP model which
produces sub-optimal code for loops with TRUNC/FLOOR modulo expressions.
Currently the vectorized variant of modulo operator is almost 4 times
faster than scalar variant for 32 bit integeres on power10.

In order to fairly compare scalar and vectorized variants of the loop
in function vect_analyze_loop_costing, update the scalar cost for
TRUNC_MOD_EXPR and FLOOR_MOD_EXPR. The value 6 is currently the number
of instructions generated for these expressions with O2 flag.

2025-09-15  Avinash Jayakar <[email protected]>

gcc/ChangeLog:
        PR target/121700
        * config/rs6000/rs6000.cc (rs6000_adjust_vect_cost_per_stmt): Add cost
        for {FLOOR,TRUNC}_MOD_EXPR.
---
 gcc/config/rs6000/rs6000.cc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8dd23f8619c..183e454c5bc 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -5311,6 +5311,12 @@ rs6000_adjust_vect_cost_per_stmt (enum 
vect_cost_for_stmt kind,
       tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt);
       if (subcode == COND_EXPR)
        return 2;
+/* For {FLOOR,TRUNC}_MOD_EXPR, cost them a bit higher in order to fairly 
+   compare the scalar and vector costs, since there is no direct instruction
+   that can evaluation these expressions with just 1 instruction. Currently
+   using the number of instructions generated for these expressions.*/
+      if (subcode == FLOOR_MOD_EXPR || subcode == TRUNC_MOD_EXPR)
+  return 6;
     }
 
   return 0;
-- 
2.47.3

Reply via email to