================
@@ -615,3 +615,38 @@ scale variant.
matrix A or B data can be reused from a previous WMMA instruction.
}];
}
+
+//===----------------------------------------------------------------------===//
+// Tensor DMA Builtins
+//===----------------------------------------------------------------------===//
+
+def DocCatTensorDMA : DocumentationCategory<"Tensor DMA Builtins"> {
+ let Content = [{
+Asynchronous tensor DMA transfers between global memory and LDS, tracked by
+the ``TENSOR_CNT`` hardware counter. The caller must order the transfer
+against later LDS accesses, either via ``s_wait_tensorcnt`` or via
+``__builtin_amdgcn_wait_asyncmark``. See the LLVM ``AMDGPUAsyncOperations``
+document for the async-operation model.
+}];
----------------
ssahasra wrote:
I don't think we should mention ``TENSOR_CNT`` at all. But I am okay if we do.
Instead of specifying any particular "later accesses", just say "track
completion using ``TENSOR_CNT`` or *asyncmarks*". If this is RST, then
*asyncmarks* can be linked to the correct doc using `:ref:`.
https://github.com/llvm/llvm-project/pull/200775
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits