>Eventhough I force all ranks only using the first card (ib0), it works >for a while and >then fails with NON_PEER_REJECTED when one rank tries to connect to >another rank (dat_connect() >and dat_evd_wait()). (I run a simple MPI job in an infinite loop, it >fails after hundreds runs);
This sounds like it could be a race condition as a result of running the test in a loop. If the client starts before the server is listening, it will receive this sort of reject event. >It works on the first card (ib0), failed on the second card (ib1) Please take a look at the following thread: http://lists.openfabrics.org/pipermail/general/2007-May/036559.html In particular, see Steve's message about this: http://lists.openfabrics.org/pipermail/general/2007-May/036571.html and let me know if his suggestion fixes your problem. I will update the librdmacm documentation with this information as well. - Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
