https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96932
--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tobias Burnus from comment #3) > Crossref: PR100497 - fails on Volta without > membar.sys; > before > atom.global.exch.b32 > > Unfortunately, compared to pre-Volta, it is very slow - membar.gl is still > slow but a bit less. Using (→ sm_70) fence.sys / fence.gnu instead of > fence.sc.{sys,gnu} (= membar.{sys,gl} on >= sm_70) does not seem to make a fence.sc.gpu, funny typo :) > performance difference for PR100497. The GOMP_atomic_start/GOMP_atomic_end are fallbacks, and unfortunately cannot be expected to be too optimal. Following the introduction of -mptx=6.3 we can add support for atom.cas.b16 (well, once we also introduce misa=sm_70), and that should be the optimal solution.