[Bug target/121818] miscompilation of parallel for reduction on nvptx target in a cholesky decomposition

schulz.benjamin at googlemail dot com via Gcc-bugs Fri, 05 Sep 2025 16:44:41 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121818


--- Comment #1 from Benjamin Schulz <schulz.benjamin at googlemail dot com> ---
Created attachment 62321
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62321&action=edit
arraylibrary.tar.gz

sourcecode of the arraylibrary. again, if you remove

the reduction(+:tmp)

in line 1610 of mathfunctions_mpi.h 
before the loop


   T tmp=0,temp4=0;
            #pragma omp target parallel for simd reduction(+:tmp)
device(policy.devicenum)
            for (size_t k = 0; k < c; ++k)
            {
                const T tmp3=tL(c,k);
                tmp+= tmp3 * tmp3;
            } 

then the results come out correctly, but that should not be, that it computes
correctly if i remove the reduciton, as this clearly is a a simple reduction. 

Originally, i had a teams distribute statement before, Then the behavior is the
same. I removed that because I did not know whether it can reduce over the
entire teams, but it does not matter apparently, the problem is there if i
reduce even with parallel for simd, which supports the reduction statement in
any case on target....

[Bug target/121818] miscompilation of parallel for reduction on nvptx target in a cholesky decomposition

Reply via email to