[clang] [llvm] [AMDGPU] Add async variants of tensor load/store LDS intrinsics. (PR #200775)

Krzysztof Drewniak via cfe-commits Tue, 02 Jun 2026 21:28:25 -0700

krzysz00 wrote:

> I have been assuming that tensor operations are also covered automatically.


I went and double-checked with a `ag --ignore-case tensor_cnt 
llvm/lib/Target/AMDGPU` and `ag --ignore-case tensorcnt 
llvm/lib/Target/AMDGPU`, which didn't reveal anything over in SIInsertWaitcnts 
or the like that points to that sort of "magic" handling.

The reason I've been pushing on getting this added to asyncmark/asyncwait is 
that Triton *et al.* are already using asyncmark/wait for the 
`global.*.async.*` intrinsics and have to have an entirely separate pass to 
manually count operations that use tensorcnt ... when one of the intended 
purposes of asyncmark/wait was to abstract away the hardware counters needed 
for this sort of software pipelining from the programmer.

https://github.com/llvm/llvm-project/pull/200775
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Add async variants of tensor load/store LDS intrinsics. (PR #200775)

Reply via email to