masahi commented on code in PR #12171:
URL: https://github.com/apache/tvm/pull/12171#discussion_r929804656


##########
src/tir/transforms/thread_storage_sync.cc:
##########
@@ -384,6 +426,9 @@ class ThreadSyncInserter : public StmtExprMutator {
 
 Stmt ThreadSync(Stmt stmt, std::string storage_scope) {
   StorageScope sync_scope = StorageScope::Create(storage_scope);
+  if (sync_scope.rank == StorageRank::kShared && sync_scope.tag == "") {

Review Comment:
   This is only for making sure that this code path is hit only once. 
`ThreadSyncAfterWaitQueueInserter` just looks for `async_wait_queue_scope` and 
inserts `syncthreads` after it. So assuming that all shared memory, including 
dynamic ones, are protected by `async_wait_queue_scope` (which should be the 
case by `InjectSoftwarePipeline`), all necessary `syncthreads` will be 
inserted.  
   
    Since `ThreadSync` is called twice, for `shared` and `shared.dyn`, 
https://github.com/apache/tvm/blob/7ef6811000a3752e843b333fd5743ea9ef6c049e/src/driver/driver_api.cc#L530-L531,
 we get two `syncthreads` without this check.
   
   Thinking about it more now, this assumes that `async_wait_queue_scope` on 
GPU is always associated with shared memory. This should be fine as long as the 
only async operation is copying into shared memory. I have to admit this is a 
bit hacky, but something like this is needed for correctness.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to