MasterJH5574 opened a new pull request, #16759: URL: https://github.com/apache/tvm/pull/16759
This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes: 1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"` to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly. 2. memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently. 3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`. 4. supports the op `relax.builtin.alloc_tensor` with storage scope. The default storage scope is `"global"`. We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
