AlexAUT wrote: > Petition to remove > > ``` > def __builtin_amdgcn_s_wait_asynccnt : AMDGPUBuiltin<"void(_Constant unsigned > short)", [], "gfx1250-insts">; > def __builtin_amdgcn_s_wait_tensorcnt : AMDGPUBuiltin<"void(_Constant > unsigned short)", [], "gfx1250-insts">; > ``` > > now? Any opposed? @krzysz00 @arsenm @ssahasra @shiltian Any folks not using > asyncmark already really should be.
Maybe I am wrong but with asyncmarks/async_wait we are unable to specify the memory instruction type (HW counter)? e.g. in Triton for persistent kernels we generate sequences like: ```asm llvm.amdgcn.global.store.async.to.lds.b32 async_mark llvm.amdgcn.tensor.load.to.lds async_mark async_wait -> just on the tensor_load? ``` Since the global_async_store can finish out of order with the tensor_load we want to sync the global async store at a much later stage. Is this possible with the current API? If not, then Triton cannot use async_marks in it's current form and we will require a bitset on the async_wait to define which memory types, tensor memory ops and/or global (async) memory ops, we want to sync. https://github.com/llvm/llvm-project/pull/200775 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
