Hi George:

The reason it tainted the PML is because the CUDA IPC support makes use of the 
large message RDMA protocol of the PML layer.  The smcuda btl starts up, but 
does not initially support any large message RDMA (RGET,RPUT) protocols.  Then 
when a GPU buffer is first accessed, the smcuda btl starts an exchange of some 
control messages with its peer.  If they determine that they can support CUDA 
IPC, then the smcuda calls up into the PML layer and says it is OK to start 
using the large message RDMA.  This all happens in code that is only compiled 
in if the user asks for CUDA-aware support.

The key requirement was I wanted to dynamically add the support for CUDA IPC 
when the user first started accessing GPU buffers rather than during MPI_Init.
This the best way I could figure out how to accomplish this but I am open to 
other ideas.   

Thanks,
Rolf

>-----Original Message-----
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Thursday, August 22, 2013 11:32 AM
>To: de...@open-mpi.org
>Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r29055 - in
>trunk/ompi/mca: btl btl/smcuda common/cuda pml/ob1
>
>I'm not very keen of seeing BTL modification tainting the PML. I would have
>expected support for IPC between GPU must be a BTL-level decision, no a
>special path in the PML.
>
>Is there a reason IPC support cannot be hidden down in the SMCUDA BTL?
>
>  Thanks,
>    George.
>
>On Aug 21, 2013, at 23:00 , svn-commit-mai...@open-mpi.org wrote:
>
>> Author: rolfv (Rolf Vandevaart)
>> Date: 2013-08-21 17:00:09 EDT (Wed, 21 Aug 2013) New Revision: 29055
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/29055
>>
>> Log:
>> Fix support in smcuda btl so it does not blow up when there is no CUDA IPC
>support between two GPUs. Also make it so CUDA IPC support is added
>dynamically.
>> Fixes ticket 3531.
>>
>> Added:
>>   trunk/ompi/mca/btl/smcuda/README
>> Text files modified:
>>   trunk/ompi/mca/btl/btl.h                         |     2
>>   trunk/ompi/mca/btl/smcuda/README                 |   113
>++++++++++++++++++++++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.c           |   104
>++++++++++++++++++++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda.h           |    28 +++++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_component.c |   200
>+++++++++++++++++++++++++++++++++++++++
>>   trunk/ompi/mca/btl/smcuda/btl_smcuda_endpoint.h  |     5 +
>>   trunk/ompi/mca/common/cuda/common_cuda.c         |    29 +++++
>>   trunk/ompi/mca/common/cuda/common_cuda.h         |     3
>>   trunk/ompi/mca/pml/ob1/pml_ob1.c                 |    11 ++
>>   trunk/ompi/mca/pml/ob1/pml_ob1_cuda.c            |    42 ++++++++
>>   trunk/ompi/mca/pml/ob1/pml_ob1_recvreq.c         |     6
>>   11 files changed, 535 insertions(+), 8 deletions(-)
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to