krzysz00 wrote: > I have been assuming that tensor operations are also covered automatically.
I went and double-checked with a `ag --ignore-case tensor_cnt llvm/lib/Target/AMDGPU` and `ag --ignore-case tensorcnt llvm/lib/Target/AMDGPU`, which didn't reveal anything over in SIInsertWaitcnts or the like that points to that sort of "magic" handling. The reason I've been pushing on getting this added to asyncmark/asyncwait is that Triton *et al.* are already using asyncmark/wait for the `global.*.async.*` intrinsics and have to have an entirely separate pass to manually count operations that use tensorcnt ... when one of the intended purposes of asyncmark/wait was to abstract away the hardware counters needed for this sort of software pipelining from the programmer. https://github.com/llvm/llvm-project/pull/200775 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
