https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105001
Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- Likewise: +PASS: libgomp.c/pr104783-2.c (test for excess errors) +FAIL: libgomp.c/pr104783-2.c execution test ... that got added in commit r12-7765-ga624388b9546b066250be8baa118b7d50c403c25 "[nvptx] Add warp sync at simt exit". --- Looking at 'libgomp.c/pr104783.c' (added in commit r12-7586-gf07178ca3c1e5dff799fb5016bb3767571db3165 "[nvptx] Disable warp sync in simt region"). The problem disappears with '-O0'. This does coincide with the non-appearance ('-O0') vs. appearance ('-O1' and higher) of the '-Wuninitialized' diagnostic: source-gcc/libgomp/testsuite/libgomp.c/pr104783.c: In function ‘main._omp_fn.0’: source-gcc/libgomp/testsuite/libgomp.c/pr104783.c:10:9: warning: ‘<anonymous>’ is used uninitialized [-Wuninitialized] 10 | #pragma omp atomic update | ^~~ source-gcc/libgomp/testsuite/libgomp.c/pr104783.c:10:9: note: ‘<anonymous>’ was declared here 10 | #pragma omp atomic update | ^~~ This one is emitted by the host compiler (!) -- and again disappears if omitting nvptx offloading compilation, via explicit '-foffload=amdgcn-amdhsa', for example. Therefore, 'diff' of host-side '-O1 -fdump-tree-all' without vs. with nvptx offloading, already in 'a-pr104783.c.009t.omplower': @@ -13,30 +13,57 @@ { val = 0; { - .omp_data_arr.2.val = &val; + .omp_data_arr.4.val = &val; - #pragma omp target num_teams(1) thread_limit(0) map(tofrom:val [len: 4]) [child fn: main._omp_fn.0 (.omp_data_arr.2, .omp_data_sizes.3, .omp_data_kinds.4)] + #pragma omp target num_teams(1) thread_limit(0) map(tofrom:val [len: 4]) [child fn: main._omp_fn.0 (.omp_data_arr.4, .omp_data_sizes.5, .omp_data_kinds.6)] { - .omp_data_i = (const struct .omp_data_t.1 & restrict) &.omp_data_arr.2; + .omp_data_i = (const struct .omp_data_t.1 & restrict) &.omp_data_arr.4; { int i; { + int D.2127; + + D.2127 = .GOMP_USE_SIMT (); + if (D.2127 != 0) goto <D.2128>; else goto <D.2129>; + <D.2128>: + { + void * simduid.2; + void * .omp_simt.3; + int i; + + simduid.2 = .GOMP_SIMT_ENTER (simduid.2); + .omp_simt.3 = .GOMP_SIMT_ENTER_ALLOC (simduid.2); + #pragma omp simd _simduid_(simduid.2) _simt_ linear(i:1) + for (i = 0; i < 1; i = i + 1) + D.2135 = .omp_data_i->val; + #pragma omp atomic_load relaxed + D.2116 = *D.2135 + D.2117 = D.2116 + 1; + #pragma omp atomic_store relaxed (D.2117) + #pragma omp continue (i, i) + .GOMP_SIMT_EXIT (.omp_simt.3); + #pragma omp return(nowait) + } + goto <D.2130>; + <D.2129>: + { int i; #pragma omp simd linear(i:1) for (i = 0; i < 1; i = i + 1) - D.2128 = .omp_data_i->val; #pragma omp atomic_load relaxed - D.2116 = *D.2128 + D.2116 = *&*D.2135 D.2117 = D.2116 + 1; #pragma omp atomic_store relaxed (D.2117) #pragma omp continue (i, i) #pragma omp return(nowait) } + <D.2130>: + } } #pragma omp return } - .omp_data_arr.2 = {CLOBBER}; + .omp_data_arr.4 = {CLOBBER}; } val.0_1 = val; if (val.0_1 != 1) goto <D.2118>; else goto <D.2119>; Notice code changes outside of the 'if ([.GOMP_USE_SIMT ()])': - D.2128 = .omp_data_i->val; #pragma omp atomic_load relaxed - D.2116 = *D.2128 + D.2116 = *&*D.2135 ..., and notice that 'D.2135' however is only set in the 'if' branch, but now also used in the 'else' branch! That may well be the origin of the '-Wuninitialized' and SIGSEGV observed for non-SIMT? (I didn't think about it very much, but can't we have a GIMPLE-level consistency check for such a thing?) --- Then, I don't know very much about the host-side implementation of OpenMP lowering for nvptx SIMT (which is what we're seeing here), but it seems "non-optimal" to me that the general host-side IR changes (even if wrapped in 'if ([.GOMP_USE_SIMT ()])'), depending on which offloading targets happen to be enabled? Wouldn't it be better to clone the affected functions for nvptx SIMT compilation, or -- even better? -- generally introduce appropriate abstractions, that the host/offload targets then later lower appropriately (whether SIMT or non-SIMT; for example, in 'pass_omp_device_lower')? (That'd be conceptually similar to what we're doing for OpenACC.)