This is a known problem and should be fixed by now, There was a bad patch that somehow got into OFED that was not in Sean main tree. Assuming this bad patch has been removed, the problem should be fixed.
woody ________________________________ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Boris Shpolyansky Sent: Friday, March 09, 2007 8:40 PM To: [email protected] Subject: [ofa-general] uDAPL question Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [EMAIL PROTECTED] ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254 I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
