zhuwenxi commented on a change in pull request #7619:
URL: https://github.com/apache/tvm/pull/7619#discussion_r594921470
##########
File path: src/tir/transforms/lower_tvm_builtin.cc
##########
@@ -74,12 +59,38 @@ class BuiltinLower : public StmtExprMutator {
ICHECK_EQ(run_array_stack_, 0);
if (prep_seq_.size() != 0) {
- Stmt ret = SeqStmt::Flatten(prep_seq_, stmt);
+ stmt = SeqStmt::Flatten(prep_seq_, stmt);
prep_seq_.clear();
- return ret;
- } else {
- return stmt;
}
+
+ // Always generated "tvm_stack_alloca" intrincis next to the
"tvm_packed_func",
+ // which makes the stacks allocated thread-local and every tvm_packed_func
will have
+ // it's own stack, rather than a shared one. This could help resolve the
race
+ // -condition issue in parallel execution.
+
+ if (emit_stack_shape_) {
+ ICHECK_NE(max_shape_stack_, -1);
Review comment:
Looks good to me, actually it's quite like the approach I proposed,
"re-allocate stack only in a parallel for loop":
https://github.com/apache/tvm/issues/7246#issuecomment-759976432
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]