[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

Sameer Sahasrabuddhe via cfe-commits Thu, 04 Jun 2026 03:25:53 -0700

================
@@ -50,6 +50,19 @@ memory and LDS memory.
   void @llvm.amdgcn.global.store.async.from.lds.type(ptr %dst, ptr %src)
   void @llvm.amdgcn.cluster.load.async.to.lds.type(ptr %dst, ptr %src)
 
+**GFX1250 Tensor DMA Instructions**
+
+.. code-block:: llvm
+
+  void @llvm.amdgcn.tensor.load.to.lds(...)
+  void @llvm.amdgcn.tensor.store.from.lds(...)
+
+These intrinsics are asynchronous despite the absence of ``async`` in their
+names. They are tracked by the ``TENSOR_CNT`` hardware counter and participate
+in the ``asyncmark`` / ``wait.asyncmark`` framework just like the intrinsics
+above. Equivalently, the caller may issue an explicit ``s_wait_tensorcnt``
+instead of using ``asyncmark`` / ``wait.asyncmark``.
----------------
ssahasra wrote:


Remove this whole paragraph. Too much information. The whole point of 
`asyncmark` is to abstract away details like `TENSOR_CNT`. If users need the 
old way of doing things, they will have to go read the ISA doc for that.

https://github.com/llvm/llvm-project/pull/200775
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Track tensor load/store DMAs with asyncmark (PR #200775)

Reply via email to