On 24.11.2013, at 10:22, Ralph Castain < r...@open-mpi.org> wrote: The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled.
You might try the 1.7.4 nightly tarball and see if the problem has been fixed.
Same problem with 1.7.4-nightly. But I compiled and started my little test program on a machine with actual Infiniband hardware and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not selected at runtime? Is this correct?
Sounds like a bug to me - if cuda is being used, we need to select ob1 regardless. I'll have to let Rolf figure that one out.
I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it should. I’m slightly overwhelmed by automake, so I don’t know how to add this reference and try it myself..
Try the attached - should fix the problem.
|
pml.diff
Description: Binary data
j On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <j...@capsec.org> wrote:
On 23.11.2013, at 22:56, Dmitry N. Mikushin <maemar...@gmail.com> wrote:
VT is getting out of sync with CUDA from time to time, this already happened before.
Yes, thats what I thought and thats why I didn’t mention it as my main issue.
I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems.
j
- D.
2013/11/24 Jörg Bornschein <j...@capsec.org>:
On 23.11.2013, at 21:42, Jörg Bornschein <j...@capsec.org> wrote:
Sorry,
I’m typically compiling with
./configure —with-cuda
I’m actually compiling with
./configure —with-cuda —disable-vt
because otherwise I get a compile time error:
make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib' CC libvt_la-vt_cudart.lo CC libvt_mpi_la-vt_pform_linux.lo CC libvt_mpi_la-vt_thrd.lo CC libvt_mpi_la-vt_trc.lo CC libvt_mpi_la-vt_user_comment.lo CC libvt_mpi_la-vt_user_control.lo CC libvt_mpi_la-vt_user_count.lo CC libvt_mpi_la-vt_user_marker.lo vt_cudart.c: In function 'cudaLaunch': vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function) vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in
j
but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get:
/a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not seem to link against it's dynamic binary.
Am I missing something?
Thanks!
jb
_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________ devel mailing list de...@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/devel
|