https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125638

--- Comment #2 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Well, best would actually  be if one does not need to add 

#pragma omp sind 

at all. And that the automatic vectorizer chooses the best code.

It clearly produces ok code for CPU. 

But it then also should automatically vectorize the GPU code which is,
unfortunately, currently only fast enough with a deliberate omp simd statement.

Furthermore:

Another surprise happens when you run that benchmark on clang. Especially on
GPU. Clang has no support for openmp simd. But they had help from Nvidia to
automatically handle these... 

I must run it again, but I think to remember, with clang and without
optimisations, last time I got 5 ms on GPU with clang for that benchmark, not
28 like on GCC with -o3 and simd or 132 ms without simd on GCC... So there is
clearly something wrong for the GPU branch.

Reply via email to