> > +-param=vect-scalar-cost-multiplier= > > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1) > IntegerRange(0, 100000) Param Optimization > > +The scaling multiplier to add to all scalar loop costing when performing > vectorization profitability analysis. The default value is 1. > > + > > Note this only allows whole number scaling. May I suggest to instead > use percentage as unit, thus the multiplier is --param > param_vect_scalar_cost_multiplier / 100? >
Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * params.opt (vect-scalar-cost-multiplier): New. * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it. * doc/invoke.texi (vect-scalar-cost-multiplier): Document it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_16.c: New test. -- inline copy of patch -- diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -17273,6 +17273,10 @@ this parameter. The default value of this parameter is 50. @item vect-induction-float Enable loop vectorization of floating point inductions. +@item vect-scalar-cost-multiplier +Apply the given multiplier % to scalar loop costing during vectorization. +Increasing the cost multiplier will make vector loops more profitable. + @item vrp-block-limit Maximum number of basic blocks before VRP switches to a lower memory algorithm. diff --git a/gcc/params.opt b/gcc/params.opt index 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies to the cost of statements i Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 1) Param Optimization Enable loop vectorization of floating point inductions. +-param=vect-scalar-cost-multiplier= +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100) IntegerRange(0, 10000) Param Optimization +The scaling multiplier as a percentage to apply to all scalar loop costing when performing vectorization profitability analysis. The default value is 100. + -param=vrp-block-limit= Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization Param Maximum number of basic blocks before VRP switches to a fast model with less memory requirements. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c new file mode 100644 index 0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization -fdump-tree-vect-details" } */ + +void +foo (char *restrict a, int *restrict b, int *restrict c, + int *restrict d, int stride) +{ + if (stride <= 1) + return; + + for (int i = 0; i < 3; i++) + { + int res = c[i]; + int t = b[i * stride]; + if (a[i] != 0) + res = t * d[i]; + c[i] = res; + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, TODO: Consider assigning different costs to different scalar statements. */ - scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost (); + scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost () + * param_vect_scalar_cost_multiplier) / 100; /* Add additional cost for the peeled instructions in prologue and epilogue loop. (For fully-masked loops there will be no peeling.)
rb19441.patch
Description: rb19441.patch