>Eventhough I force all ranks only using the first card (ib0), it works
>for a while and
>then fails with NON_PEER_REJECTED when one rank tries to connect to
>another rank (dat_connect()
>and dat_evd_wait()). (I run a simple MPI job in an infinite loop, it
>fails after hundreds runs);

This sounds like it could be a race condition as a result of running the test in
a loop.  If the client starts before the server is listening, it will receive
this sort of reject event.

>It works on the first card (ib0), failed on the second card (ib1)

Please take a look at the following thread:

http://lists.openfabrics.org/pipermail/general/2007-May/036559.html

In particular, see Steve's message about this:

http://lists.openfabrics.org/pipermail/general/2007-May/036571.html

and let me know if his suggestion fixes your problem.

I will update the librdmacm documentation with this information as well.

- Sean
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to