randomkang commented on PR #3144:
URL: https://github.com/apache/brpc/pull/3144#issuecomment-3650933774

   > I believe the long-term solution is to have users register memory with 
RDMA, allowing users to customize this memory based on their data organization 
methods. https://zhuanlan.zhihu.com/p/376989325 This link contains some 
relevant details.
   
   In this pr, brpc will recv data into fragmented gpu blocks and user must 
call IOBuf::copy_from_gpu to copy these gpu blocks into a continous hbm in 
order to use it. The d2d copy is time-consuming and not necessary.
   
   In feature, we can let the user assign the gpu destination directly with 
interface like "rdma_memory_pool_user_specified_memory" and recv the data into 
it with the opcode like IBV_WR_RDMA_READ/IBV_WR_RDMA_WRITE. Then we can skip 
the d2d copy.
   
   Futhermore, we can parse the brpc protocol with gpu kernel and initiate RDMA 
communication with GPU. Then the control path and the data path both are on gpu 
like nccl gin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to