WHAT: Add GPU Direct RDMA support to openib btl
WHY: Better latency for small GPU message transfers
WHERE: Several files, see ticket for list
WHEN: Friday,  October 18, 2013 COB
More detail:
This RFC looks to make use of GPU Direct RDMA support that is coming in the 
future in Mellanox libraries.  With GPU Direct RDMA, we can register GPU memory 
with the ibv_reg_mr() calls.  Therefore, we are simply piggy backing on the 
large message RDMA support (RGET) that exists in the PML and openib BTL.  For 
best performance, we want to use the RGET protocol at small messages and the 
switch to a pipeline protocol at larger messages.

To make use of this, we add some extra code paths that are followed when moving 
GPU buffers.   If we have the support compiled in, then when we detect we have 
a GPU buffer, we use the RGET protocol even for small messages.   When the 
messages get larger, we switch to using the regular pipeline protocol.  There 
is some other support code that is added as well.  We add a flag to any GPU 
memory that is registered so we can check for cuMemAlloc/cuMemFree/cuMemAlloc 
issues.  Each GPU has a buffer ID associated with it, so we can ensure that any 
registrations in the rcache are still valid.

To view the changes, go to https://svn.open-mpi.org/trac/ompi/ticket/3836 and 
click on 
gdr.diff<https://svn.open-mpi.org/trac/ompi/ticket/3836%20and%20click%20on%20gdr.diff>.



-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to