Roland: Thanks for the suggestion. What is the minimum safe value of timeout for typically IB network with 2-3 level of switch ?
--CQ > -----Original Message----- > From: Roland Dreier [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 11, 2007 10:48 PM > To: Tang, Changqing > Cc: Sean Hefty; [EMAIL PROTECTED] > Subject: Re: [ofa-general] RE: How fast to get > RDMA_CM_EVENT_DISCONNECTED ? > > > Yes, Internally in A, if the # of receives exceeds > lowwater(4), an ack > will be sent back. I assume ACK is not > trigered at the moment. > > when A is trying to receive a message from B, and the > message never > shows, A acctualy sends a heart beat back to > B, however, it takes > serveral seconds for this heart-beat > to complete with error ( we > configure timout ~1 sec, and > retry count 7). > > > > Serveral seconds to detect connection failure is not > acceptable for us, > so if I use rdmacm, I want to know if I > detect the connection > failure faster than heart-beat message. > > I think there is an internal contradiction in what you're doing here. > If your (ACK timeout) * (retry count) exceeds the time that > you consider acceptable to detect a failure, then you've set > your connection up wrong. It's not even meaningful to talk > about a connection failing faster than this amount of time -- > a connection will recover from a transient network failure > that resolves itself before the last retry fails, and without > a time machine it's impossible to say whether a network > failure will or will not be resolved 7 seconds into the future. > > Certainly if you receive a disconnect request, then you know > the remote side is really and truly gone. But if you've set > your timeouts/retry counts so that connections will take 7 > seconds to fail after an event like a link going down, then > there's no way to detect that failure before it occurs. > > It seems to me the solution is to reduce your timeout and/or > retry count so that connections fail within the time scale > that you require. > > - R. > _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
