================
@@ -3577,6 +3577,29 @@ An error will be given if:
   }];
 }
 
+def AMDGCNAVDocs : Documentation {
+  let Category = DocCatAMDGPUAttributes;
+  let Content = [{
+This attribute controls MakeAvailable and MakeVisible cache operations on
+AMDGPU synchronization operations. It takes a string argument specifying the
+mode.
+
+When placed on a statement containing a C/C++ atomic builtin call, the
+resulting atomic or fence instruction will carry ``!mmra !{!"amdgcn-av",
+!"<mode>"}`` metadata.
+
+The supported modes are:
+
+- ``"none"``: Skip cache writeback (on release) and cache invalidation (on
+  acquire), while preserving memory ordering (waits).
+
+.. code-block:: c++
+
+  [[clang::amdgcn_av("none")]] __atomic_thread_fence(__ATOMIC_SEQ_CST);
+  [[clang::amdgcn_av("none")]] __atomic_fetch_add(ptr, 1, __ATOMIC_ACQ_REL);
+  }];
----------------
ssahasra wrote:

So I confused your question with _cooperative atomics_. The answer to the 
actual question is yes, this new metadata is designed to improve performance 
when used in combination with `__builtin_amdgcn_av_*_b128` that we upstreamed 
in #199176. Those Clang A/V builtins and this "non-av" attribute being reviewed 
get lowered to the A/V intrinsics and the "non-av" metadata respectively, which 
is all tied together in the memory model:

https://llvm.org/docs/AMDGPUMemoryModel.html

So for a given scope S, you can perform 16-byte per lane write using an A/V 
builtin with scope S in C++, followed by a call release operation at scope S 
with the "non-av" attribute, and it will be correctly ordered according to A/V 
semantics. The A/V write will write-through to the appropriate cache level, and 
the non-av release will perform only a wait since there is no need to write 
back the entire cache.

https://github.com/llvm/llvm-project/pull/199622
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to