Hello,

I'd like to provide an overview of the gomp-nvptx branch status. In response to
this message I'll send two more emails, with libgomp and middle-end changes on
the branch.  Some of the changes to libgomp such as build machinery adaptations
have already received substantial comments in 2015, but the middle-end stuff is
mostly unreviewed I believe.

Middle-end changes mostly amount to adding SIMD-to-SIMT transforms in omp-low.c,
as shown on the Cauldron.  SIMT outlining via gimplifier abuse is not there, and
neither is cloning of SIMD/SIMT loops.  Outlining is required for correctness,
and cloning is useful as it allows to avoid intermixing SIMD+SIMT and thus be
sure that SIMT lowering does not 'dirty' SIMD loops and regress host/MIC
vectorization.  I could argue that it's possible to improve my SIMT lowering to
avoid some dirtying (like moving loop-invariant calls to GOMP_SIMT_VF()), but
the need for outlining makes that moot anyway, I think.

To get great performance this will need further changes everywhere, including
in target-independent code, due to accidents like this bug (which I'd like to
ping given the topic): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68706 

With OpenMP/PTX offloading there are 5 additional failures in 
check-target-libgomp:

Two due to tests using 'usleep' in a target region:
FAIL: libgomp.c/target-32.c (test for excess errors)
FAIL: libgomp.c/thread-limit-2.c (test for excess errors)

Two with 'target nowait' (not implemented)
FAIL: libgomp.c/target-33.c execution test
FAIL: libgomp.c/target-34.c execution test

One with 'target link' (not implemented)
FAIL: libgomp.c/target-link-1.c (test for excess errors)

Eventually these can be fixed by implementing the two missing OpenMP 4.5
features; for the 'usleep' issues, while I think it's not good to have tests
with that, eventually I'd like to provide a port of musl libc for PTX which
would also provide usleep (either a no-op stub, or based on a busy loop).

Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests.

Thanks.
Alexander

Reply via email to