Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Rolf vandeVaart Wed, 13 Apr 2011 14:48:27 -0400

[Answering both questions with this email]

These changes depend on new features in CUDA 4.0.  With CUDA 4.0, there is the 
concept of Unified Virtual Addresses, so the addresses do not overlap.  They 
are all unique within the process.  There is an API in the CUDA 4.0 that one 
can use to query what type of memory an address points to.


This work does not depend on GPU Direct.  It is making use of the fact that one 
can malloc memory, register it with IB, and register it with CUDA via the new 
4.0 API cuMemHostRegister API.  Then one can copy device memory into this 
memory.

Rolf

From: [email protected] [mailto:[email protected]] On Behalf 
Of Brice Goglin
Sent: Wednesday, April 13, 2011 1:00 PM
To: [email protected]
Subject: Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory 
directly

Hello Rolf,

This "CUDA device memory" isn't memory mapped in the host, right? Then what 
does its address look like ? When you say "when it is detected that a buffer is 
CUDA device memory", if the actual device and host address spaces are 
different, how do you know that device addresses and usual host addresses will 
never have the same values ?

Do you need GPUDirect for "to improve performance, the internal host buffers 
have to also be registered with the CUDA environment" ?

Regards,
Brice



Le 13/04/2011 18:47, Rolf vandeVaart a écrit :
WHAT: Add support to send data directly from CUDA device memory via MPI calls.

TIMEOUT: April 25, 2011

DETAILS: When programming in a mixed MPI and CUDA environment, one cannot 
currently send data directly from CUDA device memory.  The programmer first has 
to move the data into host memory, and then send it.  On the receiving side, it 
has to first be received into host memory, and then copied into CUDA device 
memory.

This RFC adds the ability to send and receive CUDA device memory directly.

There are three basic changes being made to add the support.  First, when it is 
detected that a buffer is CUDA device memory, the protocols that can be used 
are restricted to the ones that first copy data into internal buffers.  This 
means that we will not be using the PUT and RGET protocols, just the send and 
receive ones.  Secondly, rather than using memcpy to move the data into and out 
of the host buffers, the library has to use a special CUDA copy routine called 
cuMemcpy.  Lastly, to improve performance, the internal host buffers have to 
also be registered with the CUDA environment (although currently it is unclear 
how helpful that is)

By default, the code is disable and has to be configured into the library.
  --with-cuda(=DIR)       Build cuda support, optionally adding DIR/include,
                                             DIR/lib, and DIR/lib64
  --with-cuda-libdir=DIR  Search for cuda libraries in DIR

An initial implementation can be viewed at:
https://bitbucket.org/rolfv/ompi-trunk-cuda-3

Here is a list of the files being modified so one can see the scope of the 
impact.

$ svn status
M       VERSION
M       opal/datatype/opal_convertor.h
M       opal/datatype/opal_datatype_unpack.c
M       opal/datatype/opal_datatype_pack.h
M       opal/datatype/opal_convertor.c
M       opal/datatype/opal_datatype_unpack.h
M       configure.ac
M       ompi/mca/btl/sm/btl_sm.c
M       ompi/mca/btl/sm/Makefile.am
M       ompi/mca/btl/tcp/btl_tcp_component.c
M       ompi/mca/btl/tcp/btl_tcp.c
M       ompi/mca/btl/tcp/Makefile.am
M       ompi/mca/btl/openib/btl_openib_component.c
M       ompi/mca/btl/openib/btl_openib_endpoint.c
M       ompi/mca/btl/openib/btl_openib_mca.c
M       ompi/mca/mpool/sm/Makefile.am
M       ompi/mca/mpool/sm/mpool_sm_module.c
M       ompi/mca/mpool/rdma/mpool_rdma_module.c
M       ompi/mca/mpool/rdma/Makefile.am
M       ompi/mca/mpool/mpool.h
A       ompi/mca/common/cuda
A       ompi/mca/common/cuda/configure.m4
A       ompi/mca/common/cuda/common_cuda.c
A       ompi/mca/common/cuda/help-mpi-common-cuda.txt
A       ompi/mca/common/cuda/Makefile.am
A       ompi/mca/common/cuda/common_cuda.h
M       ompi/mca/pml/ob1/pml_ob1_component.c
M       ompi/mca/pml/ob1/pml_ob1_sendreq.h
M       ompi/mca/pml/ob1/pml_ob1_recvreq.h
M       ompi/mca/pml/ob1/Makefile.am
M       ompi/mca/pml/base/pml_base_sendreq.h
M       ompi/class/ompi_free_list.c
M       ompi/class/ompi_free_list.h


[email protected]<mailto:[email protected]>
781-275-5358

________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________





_______________________________________________

devel mailing list

[email protected]<mailto:[email protected]>

http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Reply via email to