MasterJH5574 opened a new pull request, #16759:
URL: https://github.com/apache/tvm/pull/16759

   This PR introduces the lowering passes for GPU IPC memory and all-reduce. It 
contains the following changes:
   
   1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"` 
to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites the storage scopes 
of the all-reduce inputs's from "global" to "ipc_memory" accordingly.
   
   2. memory planning enhancement, making the planning be aware of storage 
scopes. So each storage scope will be planned independently.
   
   3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation of 
IPC memory from builtin ops to calls to function 
`"runtime.disco.cuda_ipc.alloc_storage"`.
   
   4. supports the op `relax.builtin.alloc_tensor` with storage scope. The 
default storage scope is `"global"`.
   
   We write the new passes in Python for experiment and fast development. These 
are good demos showing we can have efficient development with the architecture 
enabled by TVM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to