randomkang commented on PR #3144: URL: https://github.com/apache/brpc/pull/3144#issuecomment-3650933774
> I believe the long-term solution is to have users register memory with RDMA, allowing users to customize this memory based on their data organization methods. https://zhuanlan.zhihu.com/p/376989325 This link contains some relevant details. In this pr, brpc will recv data into fragmented gpu blocks and user must call IOBuf::copy_from_gpu to copy these gpu blocks into a continous hbm in order to use it. The d2d copy is time-consuming and not necessary. In feature, we can let the user assign the gpu destination directly with interface like "rdma_memory_pool_user_specified_memory" and recv the data into it with the opcode like IBV_WR_RDMA_READ/IBV_WR_RDMA_WRITE. Then we can skip the d2d copy. Futhermore, we can parse the brpc protocol with gpu kernel and initiate RDMA communication with GPU. Then the control path and the data path both are on gpu like nccl gin. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
