On Nov 24, 2013, at 8:30 AM, Jörg Bornschein <j...@capsec.org> wrote:

On 24.11.2013, at 10:22, Ralph Castain <r...@open-mpi.org> wrote:

The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled.

You might try the 1.7.4 nightly tarball and see if the problem has been fixed.


Same problem with 1.7.4-nightly.

But I compiled and started my little test program on a machine with actual Infiniband hardware
and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not
selected at runtime? Is this correct?

Sounds like a bug to me - if cuda is being used, we need to select ob1 regardless. I'll have to let Rolf figure that one out.



I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it 
should. I’m slightly overwhelmed by automake, so I don’t know how to add this
reference and try it myself..

Try the attached - should fix the problem.

Attachment: pml.diff
Description: Binary data



   j 





On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <j...@capsec.org> wrote:

On 23.11.2013, at 22:56, Dmitry N. Mikushin <maemar...@gmail.com> wrote:

VT is getting out of sync with CUDA from time to time, this already
happened before.

Yes, thats what I thought and thats why I didn’t mention it as my main issue. 



I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems.


 j



- D.


2013/11/24 Jörg Bornschein <j...@capsec.org>:
On 23.11.2013, at 21:42, Jörg Bornschein <j...@capsec.org> wrote:

Sorry,

I’m typically compiling with

./configure —with-cuda


I’m actually compiling with

./configure —with-cuda —disable-vt

because otherwise I get a compile time error:

make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib'
CC       libvt_la-vt_cudart.lo
CC       libvt_mpi_la-vt_pform_linux.lo
CC       libvt_mpi_la-vt_thrd.lo
CC       libvt_mpi_la-vt_trc.lo
CC       libvt_mpi_la-vt_user_comment.lo
CC       libvt_mpi_la-vt_user_control.lo
CC       libvt_mpi_la-vt_user_count.lo
CC       libvt_mpi_la-vt_user_marker.lo
vt_cudart.c: In function 'cudaLaunch':
vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function)
vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in



 j



but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get:

/a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event

That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not
seem to link against it's dynamic binary.

Am I missing something?


Thanks!


jb

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to