RyanRio wrote: > > Petition to remove > > ``` > > def __builtin_amdgcn_s_wait_asynccnt : AMDGPUBuiltin<"void(_Constant > > unsigned short)", [], "gfx1250-insts">; > > def __builtin_amdgcn_s_wait_tensorcnt : AMDGPUBuiltin<"void(_Constant > > unsigned short)", [], "gfx1250-insts">; > > ``` > > > > > > > > > > > > > > > > > > > > > > > > now? Any opposed? @krzysz00 @arsenm @ssahasra @shiltian Any folks not using > > asyncmark already really should be. > > Maybe I am wrong but with asyncmarks/async_wait we are unable to specify the > memory instruction type (HW counter)? e.g. in Triton for persistent kernels > we generate sequences like: > > ```assembly > llvm.amdgcn.global.store.async.to.lds.b32 > async_mark > llvm.amdgcn.tensor.load.to.lds > async_mark > > async_wait -> just on the tensor_load? > ``` > > Since the global_async_store can finish out of order with the tensor_load we > want to sync the global async store at a much later stage. Is this possible > with the current API? If not, then Triton cannot use async_marks in it's > current form and we will require a bitset on the async_wait to define which > memory types, tensor memory ops and/or global (async) memory ops, we want to > sync.
Technically if you use two marks then you can just wait for one of them and get the desired behavior, but the general consensus from @ssahasra @nhaehnle et al is to provide an imm argument to make this more straightforward. https://github.com/llvm/llvm-project/pull/200775 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
