[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

via cfe-commits Thu, 04 Jun 2026 09:50:35 -0700

================
@@ -615,3 +615,38 @@ scale variant.
   matrix A or B data can be reused from a previous WMMA instruction.
 }];
 }
+
+//===----------------------------------------------------------------------===//
+// Tensor DMA Builtins
+//===----------------------------------------------------------------------===//
+
+def DocCatTensorDMA : DocumentationCategory<"Tensor DMA Builtins"> {
+  let Content = [{
+Asynchronous tensor DMA transfers between global memory and LDS, tracked by
+the ``TENSOR_CNT`` hardware counter. The caller must order the transfer
+against later LDS accesses, either via ``s_wait_tensorcnt`` or via
+``__builtin_amdgcn_wait_asyncmark``. See the LLVM ``AMDGPUAsyncOperations``
+document for the async-operation model.
+}];
----------------
adeshcom14 wrote:


Yup

https://github.com/llvm/llvm-project/pull/200775
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

Reply via email to