> > +-param=vect-scalar-cost-multiplier=
> > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1)
> IntegerRange(0, 100000) Param Optimization
> > +The scaling multiplier to add to all scalar loop costing when performing
> vectorization profitability analysis.  The default value is 1.
> > +
> 
> Note this only allows whole number scaling.  May I suggest to instead
> use percentage as unit, thus the multiplier is --param
> param_vect_scalar_cost_multiplier / 100?
> 

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        * params.opt (vect-scalar-cost-multiplier): New.
        * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
        * doc/invoke.texi (vect-scalar-cost-multiplier): Document it.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/sve/cost_model_16.c: New test.

-- inline copy of patch --

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17273,6 +17273,10 @@ this parameter.  The default value of this parameter 
is 50.
 @item vect-induction-float
 Enable loop vectorization of floating point inductions.
 
+@item vect-scalar-cost-multiplier
+Apply the given multiplier % to scalar loop costing during vectorization.
+Increasing the cost multiplier will make vector loops more profitable.
+
 @item vrp-block-limit
 Maximum number of basic blocks before VRP switches to a lower memory algorithm.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 
1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367
 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies to 
the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 
1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=vect-scalar-cost-multiplier=
+Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) 
IntegerRange(0, 10000) Param Optimization
+The scaling multiplier as a percentage to apply to all scalar loop costing 
when performing vectorization profitability analysis.  The default value is 100.
+
 -param=vrp-block-limit=
 Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization 
Param
 Maximum number of basic blocks before VRP switches to a fast model with less 
memory requirements.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
new file mode 100644
index 
0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization 
-fdump-tree-vect-details" } */
+
+void
+foo (char *restrict a, int *restrict b, int *restrict c,
+     int *restrict d, int stride)
+{
+    if (stride <= 1)
+        return;
+
+    for (int i = 0; i < 3; i++)
+        {
+            int res = c[i];
+            int t = b[i * stride];
+            if (a[i] != 0)
+                res = t * d[i];
+            c[i] = res;
+        }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
      TODO: Consider assigning different costs to different scalar
      statements.  */
 
-  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
+  scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost ()
+                            * param_vect_scalar_cost_multiplier) / 100;
 
   /* Add additional cost for the peeled instructions in prologue and epilogue
      loop.  (For fully-masked loops there will be no peeling.)

Attachment: rb19441.patch
Description: rb19441.patch

Reply via email to