With * MLNX OFED stack tailored for GPUDirect * RHEL + kernel patch * MVAPICH2
it is possible to monitor GPUDirect v1 activities by means of observing changes to values in * /sys/module/ib_core/parameters/gpu_direct_pages * /sys/module/ib_core/parameters/gpu_direct_shares By setting CUDA_NIC_INTEROP=1 there are no changes anymore. Is there a different way now to monitor if GPUDirect actually works? Sebastian. On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote: > It is documented in > http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf > set CUDA_NIC_INTEROP=1 > > > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Sebastian Rinke > Sent: Wednesday, January 18, 2012 8:15 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > Setting the environment variable fixed the problem for Open MPI with CUDA > 4.0. Thanks! > > However, I'm wondering why this is not documented in the NVIDIA GPUDirect > package. > > Sebastian. > > On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote: > > > Yes, the step outlined in your second bullet is no longer necessary. > > Rolf > > > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On > Behalf Of Sebastian Rinke > Sent: Tuesday, January 17, 2012 5:22 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > Thank you very much. I will try setting the environment variable and if > required also use the 4.1 RC2 version. > > To clarify things a little bit for me, to set up my machine with GPUDirect v1 > I did the following: > > * Install RHEL 5.4 > * Use the kernel with GPUDirect support > * Use the MLNX OFED stack with GPUDirect support > * Install the CUDA developer driver > > Does using CUDA >= 4.0 make one of the above steps redundant? > > I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is > not needed any more? > > Sebastian. > > Rolf vandeVaart wrote: > I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked > fine. I do not have a machine right now where I can load CUDA 4.0 drivers. > Any chance you can try CUDA 4.1 RC2? There were some improvements in the > support (you do not need to set an environment variable for one) > http://developer.nvidia.com/cuda-toolkit-41 > > There is also a chance that setting the environment variable as outlined in > this link may help you. > http://forums.nvidia.com/index.php?showtopic=200629 > > However, I cannot explain why MVAPICH would work and Open MPI would not. > > Rolf > > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] > On Behalf Of Sebastian Rinke > Sent: Tuesday, January 17, 2012 12:08 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2. > > Attached you find a little test case which is based on the GPUDirect v1 test > case (mpi_pinned.c). > In that program the sender splits a message into chunks and sends them > separately to the receiver which posts the corresponding recvs. It is a kind > of > pipelining. > > In mpi_pinned.c:141 the offsets into the recv buffer are set. > For the correct offsets, i.e. increasing them, it blocks with Open MPI. > > Using line 142 instead (offset = 0) works. > > The tarball attached contains a Makefile where you will have to adjust > > * CUDA_INC_DIR > * CUDA_LIB_DIR > > Sebastian > > On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote: > > > Also, which version of MVAPICH2 did you use? > > I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2) > vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting. > > Ken > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-bounces@open- > > mpi.org] > > On Behalf Of Rolf vandeVaart > Sent: Tuesday, January 17, 2012 7:54 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] GPUDirect v1 issues > > I am not aware of any issues. Can you send me a test program and I > can try it out? > Which version of CUDA are you using? > > Rolf > > > -----Original Message----- > From: devel-boun...@open-mpi.org [mailto:devel-bounces@open- > > mpi.org] > > On Behalf Of Sebastian Rinke > Sent: Tuesday, January 17, 2012 8:50 AM > To: Open MPI Developers > Subject: [OMPI devel] GPUDirect v1 issues > > Dear all, > > I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking > MPI_SEND/RECV to block forever. > > For two subsequent MPI_RECV, it hangs if the recv buffer pointer of > the second recv points to somewhere, i.e. not at the beginning, in > the recv buffer (previously allocated with cudaMallocHost()). > > I tried the same with MVAPICH2 and did not see the problem. > > Does anybody know about issues with GPUDirect v1 using Open MPI? > > Thanks for your help, > Sebastian > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel