gemini-code-assist[bot] commented on code in PR #18876:
URL: https://github.com/apache/tvm/pull/18876#discussion_r2892004228
##########
src/s_tir/transform/lower_thread_allreduce.cc:
##########
@@ -76,16 +76,32 @@ class ThreadAllreduceBuilder final : public StmtExprMutator
{
}
}
Stmt VisitStmt_(const AllocBufferNode* op) final {
+ // In flat IR, alloc_remap_ may not yet be populated when this AllocBuffer
is visited
+ // (the remap is set up by MakeAllreduce which runs during
AttrStmt/Evaluate visit
+ // that appears later in the sequence). We record the original data
pointer and
+ // attempt the remap; if it's not ready, the post-processing pass will
handle it.
+ const VarNode* orig_data_ptr = op->buffer->data.get();
auto node = Downcast<AllocBuffer>(StmtExprMutator::VisitStmt_(op));
- if (auto it = alloc_remap_.find(node->buffer->data.get()); it !=
alloc_remap_.end()) {
- Buffer buf = Downcast<Buffer>(it->second);
- auto write_ptr = node.CopyOnWrite();
- write_ptr->buffer = buf;
+ if (auto it = alloc_remap_.find(orig_data_ptr); it != alloc_remap_.end()) {
+ return RemapAllocBuffer(node, it->second);
+ }
+ // Record for deferred remapping (flat IR case)
+ pending_alloc_buffers_.emplace_back(orig_data_ptr);
+ return node;
+ }
- if (buf.scope() == "shared") {
- write_ptr->body = AttrStmt(buf->data, tir::attr::volatile_scope, 1,
write_ptr->body);
- }
+ /*!
+ * \brief Remap an AllocBuffer node to use the replacement buffer.
+ * \param node The original AllocBuffer node.
+ * \param replacement The replacement buffer.
+ * \return The remapped statement(s).
+ */
+ Stmt RemapAllocBuffer(AllocBuffer node, const Buffer& replacement) {
+ node.CopyOnWrite()->buffer = replacement;
+ if (replacement.scope() == "shared") {
+ Stmt volatile_attr = AttrStmt(replacement->data,
tir::attr::volatile_scope, 1, Evaluate(0));
+ return SeqStmt::Flatten(node, volatile_attr);
Review Comment:

The `volatile_scope` attribute is being applied to an `AttrStmt` with an
empty body (`Evaluate(0)`). This means the attribute will have no effect, as it
doesn't wrap any statements that use the volatile buffer.
With the old `AllocBuffer` node, the `volatile_scope` attribute correctly
wrapped the body of the allocation, where the buffer was used. After this
refactoring to a flat `AllocBuffer`, the statements that use the buffer are now
siblings in a `SeqStmt`.
This is a critical issue as it can lead to incorrect code generation on
GPUs. The compiler might perform invalid optimizations (e.g., reordering memory
accesses across a sync point) if it's not informed that the shared memory is
volatile.
A potential fix would be to restructure this pass to identify the scope of
usage for the shared buffer and wrap that scope with the `AttrStmt`. A similar
pattern is used in `src/s_tir/transform/lower_warp_memory.cc`, where
`VisitStmt_(const SeqStmtNode* op)` is used to find an allocation and treat
subsequent statements as its body for rewriting. A similar approach might be
needed here, possibly in the `DeferredRemapper`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]