gemini-code-assist[bot] commented on code in PR #18876:
URL: https://github.com/apache/tvm/pull/18876#discussion_r2892004228


##########
src/s_tir/transform/lower_thread_allreduce.cc:
##########
@@ -76,16 +76,32 @@ class ThreadAllreduceBuilder final : public StmtExprMutator 
{
     }
   }
   Stmt VisitStmt_(const AllocBufferNode* op) final {
+    // In flat IR, alloc_remap_ may not yet be populated when this AllocBuffer 
is visited
+    // (the remap is set up by MakeAllreduce which runs during 
AttrStmt/Evaluate visit
+    // that appears later in the sequence). We record the original data 
pointer and
+    // attempt the remap; if it's not ready, the post-processing pass will 
handle it.
+    const VarNode* orig_data_ptr = op->buffer->data.get();
     auto node = Downcast<AllocBuffer>(StmtExprMutator::VisitStmt_(op));
 
-    if (auto it = alloc_remap_.find(node->buffer->data.get()); it != 
alloc_remap_.end()) {
-      Buffer buf = Downcast<Buffer>(it->second);
-      auto write_ptr = node.CopyOnWrite();
-      write_ptr->buffer = buf;
+    if (auto it = alloc_remap_.find(orig_data_ptr); it != alloc_remap_.end()) {
+      return RemapAllocBuffer(node, it->second);
+    }
+    // Record for deferred remapping (flat IR case)
+    pending_alloc_buffers_.emplace_back(orig_data_ptr);
+    return node;
+  }
 
-      if (buf.scope() == "shared") {
-        write_ptr->body = AttrStmt(buf->data, tir::attr::volatile_scope, 1, 
write_ptr->body);
-      }
+  /*!
+   * \brief Remap an AllocBuffer node to use the replacement buffer.
+   * \param node The original AllocBuffer node.
+   * \param replacement The replacement buffer.
+   * \return The remapped statement(s).
+   */
+  Stmt RemapAllocBuffer(AllocBuffer node, const Buffer& replacement) {
+    node.CopyOnWrite()->buffer = replacement;
+    if (replacement.scope() == "shared") {
+      Stmt volatile_attr = AttrStmt(replacement->data, 
tir::attr::volatile_scope, 1, Evaluate(0));
+      return SeqStmt::Flatten(node, volatile_attr);

Review Comment:
   ![critical](https://www.gstatic.com/codereviewagent/critical.svg)
   
   The `volatile_scope` attribute is being applied to an `AttrStmt` with an 
empty body (`Evaluate(0)`). This means the attribute will have no effect, as it 
doesn't wrap any statements that use the volatile buffer.
   
   With the old `AllocBuffer` node, the `volatile_scope` attribute correctly 
wrapped the body of the allocation, where the buffer was used. After this 
refactoring to a flat `AllocBuffer`, the statements that use the buffer are now 
siblings in a `SeqStmt`.
   
   This is a critical issue as it can lead to incorrect code generation on 
GPUs. The compiler might perform invalid optimizations (e.g., reordering memory 
accesses across a sync point) if it's not informed that the shared memory is 
volatile.
   
   A potential fix would be to restructure this pass to identify the scope of 
usage for the shared buffer and wrap that scope with the `AttrStmt`. A similar 
pattern is used in `src/s_tir/transform/lower_warp_memory.cc`, where 
`VisitStmt_(const SeqStmtNode* op)` is used to find an allocation and treat 
subsequent statements as its body for rewriting. A similar approach might be 
needed here, possibly in the `DeferredRemapper`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to