[gcc r16-6891] target/123603 - add --param ix86-vect-compare-costs

Richard Biener via Gcc-cvs Mon, 19 Jan 2026 00:48:35 -0800

https://gcc.gnu.org/g:4b2db74430233302e4c711d6f958a2e7fbc643f3


commit r16-6891-g4b2db74430233302e4c711d6f958a2e7fbc643f3
Author: Richard Biener <[email protected]>
Date:   Fri Jan 16 10:22:17 2026 +0100

    target/123603 - add --param ix86-vect-compare-costs
    
    The following allows to switch the x86 target to use the vectorizer
    cost comparison mechanic to select between different vector mode
    variants of vectorizations.  The default is still to not do this
    but this allows an opt-in.
    
    On SPEC CPU 2017 for -Ofast -march=znver4 this shows 2463 out of
    39706 vectorized loops changing mode.  In 503 out of 12378 cases
    we decided to not use masked epilogs.  Compile-time increases by ~1% 
overall.
    With a quick 1-run there does not seem to be off-noise effects
    for INT, this particular optimization and target option combination
    and actual hardware to run on.  For FP 549.fotonik3d_r improves by 6%
    (confirmed with a 2-run).
    
    This was triggered by PR123190 and PR123603 which have cases where
    comparing costs would have resulted in the faster vector size to be
    used.  Both were reported for -O2 -march=x86-64-v3 -flto and with PGO.
    The PR123603 recorded regression of 548.exchange2_r with these flags
    is resolved with the flag (performance improves by 13%).  I don't
    have SPEC 2006 on that machine so did not verify the PR123190 433.milc
    regression, but that has been improved with the two earlier patches.
    The --param has no effect on the testcase in the PR.
    
    I do expect that some of our tricks in the x86 cost model to make
    larger vector sizes unprofitable will be obsolete or are
    counter-productive with cost comparison turned on.
    
            PR target/123603
            * config/i386/i386.opt (-param=ix86-vect-compare-costs=): Add.
            * config/i386/i386.cc (ix86_autovectorize_vector_modes): Honor it.
            * doc/invoke.texi (ix86-vect-compare-costs): Document.
    
            * gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: New testcase.

Diff:
---
 gcc/config/i386/i386.cc                                   |  2 +-
 gcc/config/i386/i386.opt                                  |  4 ++++
 gcc/doc/invoke.texi                                       |  3 +++
 .../gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c     | 15 +++++++++++++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 6bf4af8bbe3b..a3d0f7cb6496 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25700,7 +25700,7 @@ ix86_autovectorize_vector_modes (vector_modes *modes, 
bool all)
   if (TARGET_SSE2)
     modes->safe_push (V4QImode);
 
-  return 0;
+  return ix86_vect_compare_costs ? VECT_COMPARE_COSTS : 0;
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 4942f5124174..3b530944a36b 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1257,6 +1257,10 @@ Enable conservative small loop unrolling.
 Target Joined UInteger Var(ix86_vect_unroll_limit) Init(4) Param
 Limit how much the autovectorizer may unroll a loop.
 
+-param=ix86-vect-compare-costs=
+Target Joined UInteger Var(ix86_vect_compare_costs) Init(0) IntegerRange(0, 1) 
Param Optimization
+Whether x86 vectorizer cost modeling compares costs of different vector sizes.
+
 mlam=
 Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type) Init(lam_none)
 -mlam=[none|u48|u57] Instrument meta data position in user data pointers.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6104631d34a2..9420462538f7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18210,6 +18210,9 @@ the discovery is aborted.
 @item ix86-vect-unroll-limit
 Limit how much the autovectorizer may unroll a loop.
 
+@item ix86-vect-compare-costs
+Whether x86 vectorizer cost modeling compares costs of different vector sizes.
+
 @end table
 
 @end table
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
new file mode 100644
index 000000000000..c074176a7e42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-additional-options "--param ix86-vect-compare-costs=1" } */
+
+void foo (int *block)
+{
+  for (int i = 0; i < 3; ++i)
+    {
+      int a = block[i*9];
+      int b = block[i*9+1];
+      block[i*9] = a + 10;
+      block[i*9+1] = b + 10;
+    }
+}
+
+/* { dg-final { scan-tree-dump "optimized: loop vectorized using 8 byte 
vectors" "vect" } } */

[gcc r16-6891] target/123603 - add --param ix86-vect-compare-costs

Reply via email to