> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Friday, January 16, 2026 6:23 PM
> To: [email protected]
> Cc: Liu, Hongtao <[email protected]>
> Subject: [PATCH] target/123603 - add --param ix86-vect-compare-costs
>
> The following allows to switch the x86 target to use the vectorizer cost
> comparison mechanic to select between different vector mode variants of
> vectorizations. The default is still to not do this but this allows an
> opt-in.
>
The patch LGTM.
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> For next stage1 I'll probably propose flipping the switch (or not add the
> switch
> at all). I'll follow up with a report on how CPU 2017 behaves with this on
> vs.
If possible, we should run the next SPEC CPU benchmarks (with more
vectorization) to decide whether to switch it on.
I did similar tests on SPEC CPU 2017 two years ago - no clear benefits and
longer compile times, probably due to the crude cost model.
> off before considering to ask whether we want this switch for GCC 16 or not
> (like if it only has overly negative effects).
It would be quite interesting if we could find that some benchmarks do show
benefits.
>
> PR target/123603
> * config/i386/i386.opt (-param=ix86-vect-compare-costs=): Add.
> * config/i386/i386.cc (ix86_autovectorize_vector_modes): Honor it.
> * doc/invoke.texi (ix86-vect-compare-costs): Document.
>
> * gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: New
> testcase.
> ---
> gcc/config/i386/i386.cc | 2 +-
> gcc/config/i386/i386.opt | 4 ++++
> gcc/doc/invoke.texi | 3 +++
> .../vect/costmodel/x86_64/costmodel-pr123603.c | 15
> +++++++++++++++
> 4 files changed, 23 insertions(+), 1 deletion(-) create mode 100644
> gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index
> 6bf4af8bbe3..a3d0f7cb649 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25700,7 +25700,7 @@ ix86_autovectorize_vector_modes
> (vector_modes *modes, bool all)
> if (TARGET_SSE2)
> modes->safe_push (V4QImode);
>
> - return 0;
> + return ix86_vect_compare_costs ? VECT_COMPARE_COSTS : 0;
> }
>
> /* Implemenation of targetm.vectorize.get_mask_mode. */ diff --git
> a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> 99bb674812b..ef9efabcff6 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1249,6 +1249,10 @@ Enable conservative small loop unrolling.
> Target Joined UInteger Var(ix86_vect_unroll_limit) Init(4) Param Limit how
> much the autovectorizer may unroll a loop.
>
> +-param=ix86-vect-compare-costs=
> +Target Joined UInteger Var(ix86_vect_compare_costs) Init(0)
> +IntegerRange(0, 1) Param Optimization Whether x86 vectorizer cost
> modeling compares costs of different vector sizes.
> +
> mlam=
> Target RejectNegative Joined Enum(lam_type) Var(ix86_lam_type)
> Init(lam_none) -mlam=[none|u48|u57] Instrument meta data position in
> user data pointers.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> b703b531d75..5092e4ba9ad 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -18213,6 +18213,9 @@ the discovery is aborted.
> @item ix86-vect-unroll-limit
> Limit how much the autovectorizer may unroll a loop.
>
> +@item ix86-vect-compare-costs
> +Whether x86 vectorizer cost modeling compares costs of different vector
> sizes.
> +
> @end table
>
> @end table
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> pr123603.c b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-
> pr123603.c
> new file mode 100644
> index 00000000000..c074176a7e4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "--param ix86-vect-compare-costs=1" } */
> +
> +void foo (int *block)
> +{
> + for (int i = 0; i < 3; ++i)
> + {
> + int a = block[i*9];
> + int b = block[i*9+1];
> + block[i*9] = a + 10;
> + block[i*9+1] = b + 10;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "optimized: loop vectorized using 8 byte
> +vectors" "vect" } } */
> --
> 2.51.0