https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85204
Bug ID: 85204 Summary: [nvptx] infinite loop generated Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- Consider the following testcase (broadcast-1.c in the og7 branch): ... #include <assert.h> #include <math.h> #define N 1024 int A[N][N] ; void test(int x) { #pragma acc parallel num_gangs(16) num_workers(4) vector_length(32) copyout(A) { #pragma acc loop gang for(int j=0;j<N;j++) { if (x==1) { #pragma acc loop worker vector for(int i=0;i<N;i++) A[i][j] = 1; } else { #pragma acc loop worker vector for(int i=0;i<N;i++) A[i][j] = -1; } } } } int main(void) { test (0); for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) assert (A[i][j] == -1); test (1); for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) assert (A[i][j] == 1); return 0; } ... At -O2, the backend generates: ... { .reg .u32 %y; mov.u32 %y,%tid.y; setp.ne.u32 %r91,%y,0; } { .reg .u32 %x; mov.u32 %x,%tid.x; setp.ne.u32 %r92,%x,0; } ... $L4: @ %r91 bra.uni $L24; selp.u32 %r95,1,0,%r80; st.shared.u32 [__worker_bcast],%r95; $L25: $L24: @ %r92 bra $L25; ... Note the eternal loop at the branch to $L25. Not surprisingly, the testcase hangs. This looks like neutering gone wrong, probably the jump "@ %r92 bra $L25" is a vector neutering jump and should be placed after the worker neutering jump "@ %r91 bra.uni $L24". The failure was reported here: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02187.html (though the root cause there was mis-analyzed, and the proposed patch incorrect because it introduces a diverging bra.uni).