https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122281

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2025-10-30
     Ever confirmed|0                           |1
            Summary|libgomp: cuCtxSynchronize   |[OpenMP][SIMT] libgomp:
                   |error: an illegal memory    |cuCtxSynchronize error: an
                   |access was encountered in   |illegal memory access was
                   |code that reserves memory   |encountered in code that
                   |correctly.                  |reserves memory correctly.
           Keywords|                            |openmp
                 CC|                            |tschwinge at gcc dot gnu.org

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
This is an NVPTX only issue:

With -O0 - or with any -O0 and -foffload=disable / -foffload=amdgcn-amdhsa,
the result is:

In ompwlower:

[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234487 = *&count
[datablock.h:703:25] D.234488 = D.234487 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234488)

and then in ompexp:

  <bb 10> :
  [datablock.h:703:25] _32 = .omp_data_i_8(D)->count;
  [datablock.h:703:25 discrim 1] __atomic_fetch_add_8 (_32, 1, 0);

Which looks fine.

However, for -foffload=nvptx-none:


* omplower duplicates this code to:

[datablock.h:698:34] #pragma omp for nowait private(i.152)
for (i.152 = 0; i.152 < D.234916; i.152 = i.152 + 1)
....
[datablock.h:701:22] D.234805 = [datablock.h:701:22] *D.234804;
[datablock.h:701:13] if (D.234805 == 0.0) goto <D.234878>; else goto
<D.234879>;
<D.234878>:
[datablock.h:703:25] D.234913 = .omp_data_i->count;
[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234808 = *D.234913
[datablock.h:703:25] D.234809 = D.234808 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234809)
goto <D.234880>;

...
  #pragma omp return(nowait)
}
goto <D.234873>;
<D.234872>:
...
[datablock.h:701:22] D.234805 = [datablock.h:701:22] *D.234804;
[datablock.h:701:13] if (D.234805 == 0.0) goto <D.234882>; else goto
<D.234883>;
<D.234882>:
[datablock.h:703:25] #pragma omp atomic_load relaxed
D.234808 = *&*D.234913
[datablock.h:703:25] D.234809 = D.234808 + 1;
[datablock.h:703:25] #pragma omp atomic_store relaxed (D.234809)
goto <D.234884>;



Which still kind of looks okay but in ompexp this gets converted to:

<bb 43> :
[datablock.h:703:25 discrim 3] __atomic_fetch_add_8 (D.235057, 1, 0);

--

<bb 37> :
[datablock.h:703:25] D.235057 = .omp_data_i->count;
[datablock.h:703:25 discrim 1] __atomic_fetch_add_8 (D.235057, 1, 0);


Obviously, the .omp_data_i->count is missing. It could be hoisted, but as
written, it needs to be there - especially as the loop is executed multiple
times.

I wonder whether an 'unshare_expr' is missing here?

Reply via email to