================
@@ -3577,6 +3577,29 @@ An error will be given if:
}];
}
+def AMDGCNAVDocs : Documentation {
+ let Category = DocCatAMDGPUAttributes;
+ let Content = [{
+This attribute controls MakeAvailable and MakeVisible cache operations on
+AMDGPU synchronization operations. It takes a string argument specifying the
+mode.
+
+When placed on a statement containing a C/C++ atomic builtin call, the
+resulting atomic or fence instruction will carry ``!mmra !{!"amdgcn-av",
+!"<mode>"}`` metadata.
+
+The supported modes are:
+
+- ``"none"``: Skip cache writeback (on release) and cache invalidation (on
+ acquire), while preserving memory ordering (waits).
+
+.. code-block:: c++
+
+ [[clang::amdgcn_av("none")]] __atomic_thread_fence(__ATOMIC_SEQ_CST);
+ [[clang::amdgcn_av("none")]] __atomic_fetch_add(ptr, 1, __ATOMIC_ACQ_REL);
+ }];
----------------
ssahasra wrote:
So I confused your question with _cooperative atomics_. The answer to the
actual question is yes, this new metadata is designed to improve performance
when used in combination with `__builtin_amdgcn_av_*_b128` that we upstreamed
in #199176. Those Clang A/V builtins and this "non-av" attribute being reviewed
get lowered to the A/V intrinsics and the "non-av" metadata respectively, which
is all tied together in the memory model:
https://llvm.org/docs/AMDGPUMemoryModel.html
So for a given scope S, you can perform 16-byte per lane write using an A/V
builtin with scope S in C++, followed by a call release operation at scope S
with the "non-av" attribute, and it will be correctly ordered according to A/V
semantics. The A/V write will write-through to the appropriate cache level, and
the non-av release will perform only a wait since there is no need to write
back the entire cache.
https://github.com/llvm/llvm-project/pull/199622
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits