MasterJH5574 opened a new pull request, #16750:
URL: https://github.com/apache/tvm/pull/16750

   This PR introduces the CUDA IPC memory support in TVM runtime. IPC memory 
allows multiple distribtued workers accessing the GPU memory of each other 
directly. This functionality is helpful for implementing customzied 
communication primitives across distributed workers.
   
   In this PR, we bring the customized all-reduce implementation from 
TensorRT-LLM into 3rdparty. This all-reduce implementation makes use of the 
CUDA IPC memory. We expose the all-reduce function in global function under 
namespace `tvm::runtime::disco::cuda_ipc`.
   
   One unit test for the customized all-reduce kernel over two workers is added.
   
   ---
   
   Co-authored-by: Hongyi Jin <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to