This is a known problem and should be fixed by now,
There was a bad patch that somehow got into OFED
that was not in Sean main tree. Assuming this
bad patch has been removed, the problem should be
fixed.

 
woody
 

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Boris
Shpolyansky
Sent: Friday, March 09, 2007 8:40 PM
To: [email protected]
Subject: [ofa-general] uDAPL question


Hi, 
 
I'm trying to get simple Intel MPI benchmark running over IB (uDAPL)
using OFED-1.1 stack.
I'm consistently getting the following error:
 
[EMAIL PROTECTED] ~]# ./runjob_I_MPI.boris 2
Task 0 of 2 tasks started on host ibd005.ibd.mti.com
clock_resolution = 1.00e-06 s
Task 1 of 2 tasks started on host ibd006.ibd.mti.com
[0:ibd005] unexpected DAPL event 4006 from 1:ibd006
[1:ibd006] unexpected DAPL event 4006 from 0:ibd005
rank 0 in job 14  ibd005_36193   caused collective abort of all ranks
  exit status of rank 0: return code 254 

I did some digging and found out that event 4006 (actually 0x4006) means
DAT_CONNECTION_EVENT_BROKEN
and it is returned by function dat_rmr_bind. 
So my question is why this function consistently fails.
I'm using standard dat.conf file:
 
OpenIB-cma u1.2 nonthreadsafe default
/usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" ""

Appreciate your help,
 
Boris Shpolyansky 
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to