https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85341
Bug ID: 85341 Summary: [nvptx] Implement atomic load Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: vries at gcc dot gnu.org Target Milestone: --- [ Follow-up PR of PR84041 - "[nvptx] Hang in for-3.c" ] Atm the nvptx port does not define an atomic load insn. Consequently, it goes through the fallback scenario in expand_atomic_load, and ends up generating a regular load insn combined with a membar.sys memory barrier. [ Context: The __atomic_load builtin is defined as: ... Built-in Function: type __atomic_load_n (type *ptr, int memorder) This built-in function implements an atomic load operation. It returns the contents of *ptr. The valid memory order variants are __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and __ATOMIC_CONSUME. ... The atomic_load insn pattern is described like this (with a local fix applied for https://gcc.gnu.org/ml/gcc-patches/2018-04/msg00517.html ): ... ‘atomic_loadmode’ This pattern implements an atomic load operation with memory model semantics. Operand 1 is the memory address being loaded from. Operand 0 is the result of the load. Operand 2 is the memory model to be used for the load operation. If not present, the __atomic_load built-in function will resort to a normal load with memory barriers. ... ] If we'd define an atomic_load insn pattern, we could be able to use the pointer operand to deduce a reduced scope (.gpu or .cta) for the memory barrier. Say we define memory spaces __global and __shared, then we could used membar.gpu for __global and membar.cta for __shared. Of course, we'd have to annotate libgomp/config/nvptx with the appropriate namespaces, otherwise we keep generating the same code there.