================
@@ -1708,6 +1710,19 @@ bool 
SIInsertWaitcnts::insertWaitcntInBlock(MachineFunction &MF,
     }
 
     ++Iter;
+    if (ST->isPreciseMemoryEnabled() && Inst.mayLoadOrStore()) {
+      auto Builder =
+          BuildMI(Block, Iter, DebugLoc(), TII->get(AMDGPU::S_WAITCNT))
+              .addImm(0);
+      if (IsGFX10Plus) {
----------------
jwanggit86 wrote:

My understanding is that the feature request asks for a "s_waitcnt 0" to be 
*blindly* inserted after each and every memory instruction. Enabling the 
feature is at the user's discretion via a clang command-line option (disabled 
by default). The purpose of the feature is to help debug memory problems on 
GPUs that do not support precise memory. (Although someone, Tony I think, 
mentioned it could go beyond debugging). I'll send you the link for the feature 
request.

Based on that, the implementation doesn't check on GPU models, doesn't have 
model-dependent code (except the newly-added code for GFX10+), or differentiate 
loads from stores. I'll work with the requester to get the requirements 
straightened out.

https://github.com/llvm/llvm-project/pull/68932
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to