I can replicate this on thor with the trunk, this looks like a multi- nic issue, as we pass the test when I restrict open-mpi to use a single ib nic. I will dig into this further but should we consider the priority of multi-nic for the 1.0.1 release?


Thanks,

Galen

On Nov 28, 2005, at 7:42 PM, Galen M. Shipman wrote:

Hi Andrew,

I am not able to replicate this on odin with 16 nodes using the trunk or the v1.0 branch. How many nodes where you running with?

Thanks,

Galen


On Nov 23, 2005, at 5:46 PM, Andrew Friedley wrote:

I'm running the intel test suite against ompi revision r8247 (v1.0
branch), and the MPI_Probe_tag_c test is hanging on IU's thor cluster.
This only happens with using mvapi, and not with gm or tcp.  The hang
happens whether or not I use sm with mvapi.

The processes appear to be spinning on the CPU, and a backtrace of one
of them looks like the following:

(gdb) bt
#0  0x40341754 in ioctl () from /lib/libc.so.6
#1 0x404bbe99 in vip_ioctl_wrapper (ops=VIPKL_OPEN_HCA, pi=0x0, pi_sz=0,
     po=0x0, po_sz=0) at vipkl_sys_user.c:54
#2  0x404bb886 in VIPKL_EQ_poll (usr_ctx=0x0, hca_hndl=0, vipkl_eq=0,
     eqe_p=0x40de3eb4) at vipkl_wrap_user.c:1676
#3 0x404bc0e1 in eq_poll_thread (eq_pollt_ptr=0x81377f8) at hobul.c:320
#4  0x4024aef6 in pthread_start_thread () from /lib/libpthread.so.0
#5  0x4034823a in clone () from /lib/libc.so.6


I'm not sure this is useful - can someone else reproduce this? If more
information is needed, let me know.

Andrew
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to