Hello, I'd like to provide an overview of the gomp-nvptx branch status. In response to this message I'll send two more emails, with libgomp and middle-end changes on the branch. Some of the changes to libgomp such as build machinery adaptations have already received substantial comments in 2015, but the middle-end stuff is mostly unreviewed I believe.
Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in omp-low.c, as shown on the Cauldron. SIMT outlining via gimplifier abuse is not there, and neither is cloning of SIMD/SIMT loops. Outlining is required for correctness, and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC vectorization. I could argue that it's possible to improve my SIMT lowering to avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but the need for outlining makes that moot anyway, I think. To get great performance this will need further changes everywhere, including in target-independent code, due to accidents like this bug (which I'd like to ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 With OpenMP/PTX offloading there are 5 additional failures in check-target-libgomp: Two due to tests using 'usleep' in a target region: FAIL: libgomp.c/target-32.c (test for excess errors) FAIL: libgomp.c/thread-limit-2.c (test for excess errors) Two with 'target nowait' (not implemented) FAIL: libgomp.c/target-33.c execution test FAIL: libgomp.c/target-34.c execution test One with 'target link' (not implemented) FAIL: libgomp.c/target-link-1.c (test for excess errors) Eventually these can be fixed by implementing the two missing OpenMP 4.5 features; for the 'usleep' issues, while I think it's not good to have tests with that, eventually I'd like to provide a port of musl libc for PTX which would also provide usleep (either a no-op stub, or based on a busy loop). Short term, it should be possible to implement something like -foffload=^nvptx to skip PTX (and only PTX) offloading on those tests. Thanks. Alexander