The following adds special handling to OMP SIMD vector call costs
which were not costed at all and for which a single simple vector
stmt isn't appropriate. PR125174 shows that even when AVX imposes
more overhead (from also slightly bogus costing) than SSE, when
there's two OMP SIMD calls involved doing less of those should trump
that.
Bootstrap & regtest ongoing on x86_64-unknown-linux-gnu.
I've verified this resolves the observed 465.tonto regression. I
thought about catching all OMP SIMD vectorized stmts but then
realized scalar costing doesn't see this yet so we'll make all
vectorizations unprofitable. We cannot handle all calls this
way either, as some directly expand to native insns (popcount, etc.).
So I fear we have to maintain a positive list. There's 52
'notinbranch' SIMD declatations in glibc 2.38 on x86_64, probably
different ones on ARM.
Also we of course have no idea about actual cost of the call
(but it's expensive). Nor do we have an idea of the scalar
vs. vector cost.
But as the PR shows, doing "nothing" isn't an option, at least
when, like on x86 there's both SSE and AVX variants and the
surrounding code would make the SSE variant (appear) cheaper.
Any good ideas?
Otherwise I'll try to extensively cover all libm builtins
(anticipating future SIMD-ification) in the same way, with
same costs.
Thanks,
Richard.
PR target/125174
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Cost calls as 10 times FMA.
---
gcc/config/i386/i386.cc | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index e73c2d7f7d0..6b271ac3fca 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26602,6 +26602,13 @@ ix86_vector_costs::add_stmt_cost (int count,
vect_cost_for_stmt kind,
case CFN_MULH:
stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
break;
+ CASE_CFN_SIN:
+ CASE_CFN_COS:
+ CASE_CFN_EXP:
+ stmt_cost = 10 * ix86_vec_cost (mode,
+ mode == SFmode ? ix86_cost->fmass
+ : ix86_cost->fmasd);
+ break;
default:
break;
}
--
2.51.0