[Bug target/125638] openmp performance problem #pragma omp simd is necessary to get fast vectorized loops on gpu, whereas on cpu, automatic optimization is much faster than the simd statement for some loops

schulz.benjamin at googlemail dot com via Gcc-bugs Mon, 08 Jun 2026 01:56:54 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125638


--- Comment #2 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Well, best would actually  be if one does not need to add 

#pragma omp sind 

at all. And that the automatic vectorizer chooses the best code.

It clearly produces ok code for CPU. 

But it then also should automatically vectorize the GPU code which is,
unfortunately, currently only fast enough with a deliberate omp simd statement.

Furthermore:

Another surprise happens when you run that benchmark on clang. Especially on
GPU. Clang has no support for openmp simd. But they had help from Nvidia to
automatically handle these... 

I must run it again, but I think to remember, with clang and without
optimisations, last time I got 5 ms on GPU with clang for that benchmark, not
28 like on GCC with -o3 and simd or 132 ms without simd on GCC... So there is
clearly something wrong for the GPU branch.

[Bug target/125638] openmp performance problem #pragma omp simd is necessary to get fast vectorized loops on gpu, whereas on cpu, automatic optimization is much faster than the simd statement for some loops

Reply via email to