https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Likewise:

    +PASS: libgomp.c/pr104783-2.c (test for excess errors)
    +FAIL: libgomp.c/pr104783-2.c execution test

... that got added in commit r12-7765-ga624388b9546b066250be8baa118b7d50c403c25
"[nvptx] Add warp sync at simt exit".

---

Looking at 'libgomp.c/pr104783.c' (added in commit
r12-7586-gf07178ca3c1e5dff799fb5016bb3767571db3165 "[nvptx] Disable warp sync
in simt region").

The problem disappears with '-O0'.  This does coincide with the non-appearance
('-O0') vs. appearance ('-O1' and higher) of the '-Wuninitialized' diagnostic:

    source-gcc/libgomp/testsuite/libgomp.c/pr104783.c: In function
‘main._omp_fn.0’:
    source-gcc/libgomp/testsuite/libgomp.c/pr104783.c:10:9: warning:
‘<anonymous>’ is used uninitialized [-Wuninitialized]
       10 | #pragma omp atomic update
          |         ^~~
    source-gcc/libgomp/testsuite/libgomp.c/pr104783.c:10:9: note: ‘<anonymous>’
was declared here
       10 | #pragma omp atomic update
          |         ^~~

This one is emitted by the host compiler (!) -- and again disappears if
omitting nvptx offloading compilation, via explicit '-foffload=amdgcn-amdhsa',
for example.

Therefore, 'diff' of host-side '-O1 -fdump-tree-all' without vs. with nvptx
offloading, already in 'a-pr104783.c.009t.omplower':

    @@ -13,30 +13,57 @@
           {
             val = 0;
             {
    -          .omp_data_arr.2.val = &val;
    +          .omp_data_arr.4.val = &val;
    -          #pragma omp target num_teams(1) thread_limit(0) map(tofrom:val
[len: 4]) [child fn: main._omp_fn.0 (.omp_data_arr.2, .omp_data_sizes.3,
.omp_data_kinds.4)]
    +          #pragma omp target num_teams(1) thread_limit(0) map(tofrom:val
[len: 4]) [child fn: main._omp_fn.0 (.omp_data_arr.4, .omp_data_sizes.5,
.omp_data_kinds.6)]
                 {
    -              .omp_data_i = (const struct .omp_data_t.1 & restrict)
&.omp_data_arr.2;
    +              .omp_data_i = (const struct .omp_data_t.1 & restrict)
&.omp_data_arr.4;
                   {
                     int i;

                     {
    +                  int D.2127;
    +
    +                  D.2127 = .GOMP_USE_SIMT ();
    +                  if (D.2127 != 0) goto <D.2128>; else goto <D.2129>;
    +                  <D.2128>:
    +                  {
    +                    void * simduid.2;
    +                    void * .omp_simt.3;
    +                    int i;
    +
    +                    simduid.2 = .GOMP_SIMT_ENTER (simduid.2);
    +                    .omp_simt.3 = .GOMP_SIMT_ENTER_ALLOC (simduid.2);
    +                    #pragma omp simd _simduid_(simduid.2) _simt_
linear(i:1)
    +                    for (i = 0; i < 1; i = i + 1)
    +                    D.2135 = .omp_data_i->val;
    +                    #pragma omp atomic_load relaxed
    +                      D.2116 = *D.2135
    +                    D.2117 = D.2116 + 1;
    +                    #pragma omp atomic_store relaxed (D.2117)
    +                    #pragma omp continue (i, i)
    +                    .GOMP_SIMT_EXIT (.omp_simt.3);
    +                    #pragma omp return(nowait)
    +                  }
    +                  goto <D.2130>;
    +                  <D.2129>:
    +                  {
                       int i;

                       #pragma omp simd linear(i:1)
                       for (i = 0; i < 1; i = i + 1)
    -                  D.2128 = .omp_data_i->val;
                       #pragma omp atomic_load relaxed
    -                    D.2116 = *D.2128
    +                      D.2116 = *&*D.2135
                       D.2117 = D.2116 + 1;
                       #pragma omp atomic_store relaxed (D.2117)
                       #pragma omp continue (i, i)
                       #pragma omp return(nowait)
                     }
    +                  <D.2130>:
    +                }
                   }
                   #pragma omp return
                 }
    -          .omp_data_arr.2 = {CLOBBER};
    +          .omp_data_arr.4 = {CLOBBER};
             }
             val.0_1 = val;
             if (val.0_1 != 1) goto <D.2118>; else goto <D.2119>;

Notice code changes outside of the 'if ([.GOMP_USE_SIMT ()])':

    -                  D.2128 = .omp_data_i->val;
                       #pragma omp atomic_load relaxed
    -                    D.2116 = *D.2128
    +                      D.2116 = *&*D.2135

..., and notice that 'D.2135' however is only set in the 'if' branch, but now
also used in the 'else' branch!  That may well be the origin of the
'-Wuninitialized' and SIGSEGV observed for non-SIMT?

(I didn't think about it very much, but can't we have a GIMPLE-level
consistency check for such a thing?)

---

Then, I don't know very much about the host-side implementation of OpenMP
lowering for nvptx SIMT (which is what we're seeing here), but it seems
"non-optimal" to me that the general host-side IR changes (even if wrapped in
'if ([.GOMP_USE_SIMT ()])'), depending on which offloading targets happen to be
enabled?  Wouldn't it be better to clone the affected functions for nvptx SIMT
compilation, or -- even better? -- generally introduce appropriate
abstractions, that the host/offload targets then later lower appropriately
(whether SIMT or non-SIMT; for example, in 'pass_omp_device_lower')?  (That'd
be conceptually similar to what we're doing for OpenACC.)

Reply via email to