================ @@ -1708,6 +1710,19 @@ bool SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF, } ++Iter; + if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) { + auto Builder = + BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT)) + .addImm(0); + if (IsGFX10Plus) { ---------------- jwanggit86 wrote:
My understanding is that the feature request asks for a "s_waitcnt 0" to be *blindly* inserted after each and every memory instruction. Enabling the feature is at the user's discretion via a clang command-line option (disabled by default). The purpose of the feature is to help debug memory problems on GPUs that do not support precise memory. (Although someone, Tony I think, mentioned it could go beyond debugging). I'll send you the link for the feature request. Based on that, the implementation doesn't check on GPU models, doesn't have model-dependent code (except the newly-added code for GFX10+), or differentiate loads from stores. I'll work with the requester to get the requirements straightened out. https://github.com/llvm/llvm-project/pull/68932 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits