Sean:
        Thanks, I think this solve our problem. Currently two cards are
on different subnet. Code on either subnet is working reliablely. I have
not tried if all cards are on the same subnet.

        Do you recommend to config as a single subnet or two subnets ?


--CQ 

> -----Original Message-----
> From: Sean Hefty [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 06, 2007 11:48 AM
> To: Tang, Changqing; Arlin Davis
> Cc: Vladimir Sokolovsky; OpenFabrics General
> Subject: RE: [ofa-general] [GIT PULL] OFED 1.2 uDAPL release notes
> 
> >Eventhough I force all ranks only using the first card 
> (ib0), it works 
> >for a while and then fails with NON_PEER_REJECTED when one 
> rank tries 
> >to connect to another rank (dat_connect() and 
> dat_evd_wait()). (I run a 
> >simple MPI job in an infinite loop, it fails after hundreds runs);
> 
> This sounds like it could be a race condition as a result of 
> running the test in a loop.  If the client starts before the 
> server is listening, it will receive this sort of reject event.
> 
> >It works on the first card (ib0), failed on the second card (ib1)
> 
> Please take a look at the following thread:
> 
> http://lists.openfabrics.org/pipermail/general/2007-May/036559.html
> 
> In particular, see Steve's message about this:
> 
> http://lists.openfabrics.org/pipermail/general/2007-May/036571.html
> 
> and let me know if his suggestion fixes your problem.
> 
> I will update the librdmacm documentation with this 
> information as well.
> 
> - Sean
> 
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to