Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Shamis, Pavel Thu, 14 Apr 2011 10:10:44 -0400

Hello Rolf,

CUDA support is always welcome. 
Please see my comments bellow


+#if OMPI_CUDA_SUPPORT
+    fl->fl_frag_block_alignment = 0;
+    fl->fl_flags = 0;
+#endif

[pasha] It seem that the "fl_flags" is a hack that allow you to do the second 
(cuda) registration in
 mpool_rdma:

+#if OMPI_CUDA_SUPPORT
+    if ((flags & MCA_MPOOL_FLAGS_CUDA_MEM) && 
mca_common_cuda_registered_memory) {
+        mca_common_cuda_register(addr, size,
+                                 
mpool->mpool_component->mpool_version.mca_component_name);
+   }
+#endif

[pasha] It is really _hack_ way to enable multiple device registration.  
I would prefer see new mpool component, that has support multiple device 
registration in contrast to single device registration in mpool_rdma.

     fl->fl_payload_buffer_size=0;
     fl->fl_payload_buffer_alignment=0;
     fl->fl_frag_class = OBJ_CLASS(ompi_free_list_item_t);
@@ -190,7 +194,19 @@
     alloc_size = num_elements * head_size + sizeof(ompi_free_list_memory_t) +
         flist->fl_frag_alignment;

+#if OMPI_CUDA_SUPPORT
+    /* Hack for TCP since there is no memory pool. */
+    if (flist->fl_frag_block_alignment) {
+        alloc_size = OPAL_ALIGN(alloc_size, 4096, size_t);
+        if((errno = posix_memalign((void *)&alloc_ptr, 4096, alloc_size)) != 
0) {
+            alloc_ptr = NULL;
+        }
+    } else {
+        alloc_ptr = (ompi_free_list_memory_t*)malloc(alloc_size);
+    }
+#else
     alloc_ptr = (ompi_free_list_memory_t*)malloc(alloc_size);
+#endif

[pasha] I would prefer not to _hack_ ompi_free_list  in order to work around 
BTL related issues. Such kinda of problem should be handled by tcp btl. If you 
think, that it is not enough flexibility in free list or mpool interface, we 
may discuss the inderface update or modification. IMHO it is much better that 
hack.

Regards,

Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:

> WHAT: Add support to send data directly from CUDA device memory via MPI calls.
>  
> TIMEOUT: April 25, 2011
>  
> DETAILS: When programming in a mixed MPI and CUDA environment, one cannot 
> currently send data directly from CUDA device memory.  The programmer first 
> has to move the data into host memory, and then send it.  On the receiving 
> side, it has to first be received into host memory, and then copied into CUDA 
> device memory.
>  
> This RFC adds the ability to send and receive CUDA device memory directly.
>  
> There are three basic changes being made to add the support.  First, when it 
> is detected that a buffer is CUDA device memory, the protocols that can be 
> used are restricted to the ones that first copy data into internal buffers.  
> This means that we will not be using the PUT and RGET protocols, just the 
> send and receive ones.  Secondly, rather than using memcpy to move the data 
> into and out of the host buffers, the library has to use a special CUDA copy 
> routine called cuMemcpy.  Lastly, to improve performance, the internal host 
> buffers have to also be registered with the CUDA environment (although 
> currently it is unclear how helpful that is)
>  
> By default, the code is disable and has to be configured into the library.
>   --with-cuda(=DIR)       Build cuda support, optionally adding DIR/include,
>                                              DIR/lib, and DIR/lib64
>   --with-cuda-libdir=DIR  Search for cuda libraries in DIR
>  
> An initial implementation can be viewed at:
> https://bitbucket.org/rolfv/ompi-trunk-cuda-3
>  
> Here is a list of the files being modified so one can see the scope of the 
> impact.
>  
> $ svn status
> M       VERSION
> M       opal/datatype/opal_convertor.h
> M       opal/datatype/opal_datatype_unpack.c
> M       opal/datatype/opal_datatype_pack.h
> M       opal/datatype/opal_convertor.c
> M       opal/datatype/opal_datatype_unpack.h
> M       configure.ac
> M       ompi/mca/btl/sm/btl_sm.c
> M       ompi/mca/btl/sm/Makefile.am
> M       ompi/mca/btl/tcp/btl_tcp_component.c
> M       ompi/mca/btl/tcp/btl_tcp.c
> M       ompi/mca/btl/tcp/Makefile.am
> M       ompi/mca/btl/openib/btl_openib_component.c
> M       ompi/mca/btl/openib/btl_openib_endpoint.c
> M       ompi/mca/btl/openib/btl_openib_mca.c
> M       ompi/mca/mpool/sm/Makefile.am
> M       ompi/mca/mpool/sm/mpool_sm_module.c
> M       ompi/mca/mpool/rdma/mpool_rdma_module.c
> M       ompi/mca/mpool/rdma/Makefile.am
> M       ompi/mca/mpool/mpool.h
> A       ompi/mca/common/cuda
> A       ompi/mca/common/cuda/configure.m4
> A       ompi/mca/common/cuda/common_cuda.c
> A       ompi/mca/common/cuda/help-mpi-common-cuda.txt
> A       ompi/mca/common/cuda/Makefile.am
> A       ompi/mca/common/cuda/common_cuda.h
> M       ompi/mca/pml/ob1/pml_ob1_component.c
> M       ompi/mca/pml/ob1/pml_ob1_sendreq.h
> M       ompi/mca/pml/ob1/pml_ob1_recvreq.h
> M       ompi/mca/pml/ob1/Makefile.am
> M       ompi/mca/pml/base/pml_base_sendreq.h
> M       ompi/class/ompi_free_list.c
> M       ompi/class/ompi_free_list.h
>  
>  
> rvandeva...@nvidia.com
> 781-275-5358
>  
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] RFC: Add support to send/receive CUDA device memory directly

Reply via email to