[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

Sameer Sahasrabuddhe via cfe-commits Thu, 04 Jun 2026 03:25:53 -0700

================
@@ -615,3 +615,38 @@ scale variant.
   matrix A or B data can be reused from a previous WMMA instruction.
 }];
 }
+
+//===----------------------------------------------------------------------===//
+// Tensor DMA Builtins
+//===----------------------------------------------------------------------===//
+
+def DocCatTensorDMA : DocumentationCategory<"Tensor DMA Builtins"> {
+  let Content = [{
+Asynchronous tensor DMA transfers between global memory and LDS, tracked by
+the ``TENSOR_CNT`` hardware counter. The caller must order the transfer
+against later LDS accesses, either via ``s_wait_tensorcnt`` or via
+``__builtin_amdgcn_wait_asyncmark``. See the LLVM ``AMDGPUAsyncOperations``
+document for the async-operation model.
+}];
----------------
ssahasra wrote:


I don't think we should mention ``TENSOR_CNT`` at all. But I am okay if we do. 
Instead of specifying any particular "later accesses", just say "track 
completion using ``TENSOR_CNT`` or *asyncmarks*". If this is RST, then 
*asyncmarks* can be linked to the correct doc using `:ref:`.

https://github.com/llvm/llvm-project/pull/200775
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

Reply via email to