https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123272

--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Created attachment 63130
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63130&action=edit
gpu_compiler_test_xnvptx-none.ii

here is the ii file from -save-temps of the miscompilated matrix
multiplication...

Interestingly, if one would use #pragma omp target teams distribute for the
first loop and #pragma omp parallel for for the second, then the results would
be correct, but the collapse(2) statement is valid in the matrix multiplication
for the first two loops. It is also needed for performance improvements if the
matrices are not square. The collapse statement works on the host, and even for
gcc on nvptx when using -O1. 

The wrong results are observed only if one has no optimization!, which is also
strange. And only for the class with templates...

Reply via email to