On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote:
> extern void vf1()
> {
> #pragma vectorize enable
> for ( int i = 0 ; i < 32768 ; i++ )
> data [ i ] = std::sqrt ( data [ i ] ) ;
> }
>
> Compiling on this x86_64 box with -fopt-info-vec-missed shows the
> _7 = .SQRT (_1);
> if (_1 u>= 0.0)
> goto <bb 8>; [99.95%]
> else
> goto <bb 4>; [0.05%]
>
> <bb 8> [local count: 1062472912]:
> goto <bb 5>; [100.00%]
>
> <bb 4> [local count: 531495]:
> __builtin_sqrtf (_1);
>
> I'm not sure where that control flow came from: it isn't in
> sqrt-test.cc.104t.stdarg
> but is in
> sqrt-test.cc.105t.cdce
> so I think it's coming from the argument-range code in cdce.
>
> Arguably the location on the statement is wrong: it's on the loop
> header, when it presumably should be on the std::sqrt call.
See my either mail, it is the result of the -fmath-errno default,
the inline emitted sqrt doesn't handle errno setting and we emit
essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg);
where
the former sqrt is inline using HW instructions and the latter is the
library call.
With some extra work we could vectorize it; e.g. if we make it handle
OpenMP #pragma omp ordered simd efficiently, it would be the same thing
- allow non-vectorizable portions of vectorized loops by doing there a
scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the
limitation
that the vectorized loop is a single bb. Essentially, in this case it would
be
vec1 = vec_load (data + i);
vec2 = vec_sqrt (vec1);
if (__builtin_expect (any (vec2 < 0.0)))
{
for (int i = 0; i < vf; i++)
sqrt (vec2[i]);
}
vec_store (data + i, vec2);
If that would turn to be way too hard, we could for the vectorization
purposes hide that into the .SQRT internal fn, say add a fndecl argument to
it if it should treat the exceptional cases some way so that the control
flow isn't visible in the vectorized loop.
Jakub