Hi, I'm trying to get simple Intel MPI benchmark running over IB (uDAPL) using OFED-1.1 stack. I'm consistently getting the following error: [EMAIL PROTECTED] ~]# ./runjob_I_MPI.boris 2 Task 0 of 2 tasks started on host ibd005.ibd.mti.com clock_resolution = 1.00e-06 s Task 1 of 2 tasks started on host ibd006.ibd.mti.com [0:ibd005] unexpected DAPL event 4006 from 1:ibd006 [1:ibd006] unexpected DAPL event 4006 from 0:ibd005 rank 0 in job 14 ibd005_36193 caused collective abort of all ranks exit status of rank 0: return code 254
I did some digging and found out that event 4006 (actually 0x4006) means DAT_CONNECTION_EVENT_BROKEN and it is returned by function dat_rmr_bind. So my question is why this function consistently fails. I'm using standard dat.conf file: OpenIB-cma u1.2 nonthreadsafe default /usr/local/ofed/lib64/libdaplcma.so mv_dapl.1.2 "ib0 0" "" Appreciate your help, Boris Shpolyansky
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
