> Yes, Internally in A, if the # of receives exceeds lowwater(4), an ack > will be sent back. I assume ACK is not trigered at the moment. > when A is trying to receive a message from B, and the message never > shows, A acctualy sends a heart beat back to B, however, it takes > serveral seconds for this heart-beat to complete with error ( we > configure timout ~1 sec, and retry count 7). > > Serveral seconds to detect connection failure is not acceptable for us, > so if I use rdmacm, I want to know if I detect the connection > failure faster than heart-beat message.
I think there is an internal contradiction in what you're doing here. If your (ACK timeout) * (retry count) exceeds the time that you consider acceptable to detect a failure, then you've set your connection up wrong. It's not even meaningful to talk about a connection failing faster than this amount of time -- a connection will recover from a transient network failure that resolves itself before the last retry fails, and without a time machine it's impossible to say whether a network failure will or will not be resolved 7 seconds into the future. Certainly if you receive a disconnect request, then you know the remote side is really and truly gone. But if you've set your timeouts/retry counts so that connections will take 7 seconds to fail after an event like a link going down, then there's no way to detect that failure before it occurs. It seems to me the solution is to reduce your timeout and/or retry count so that connections fail within the time scale that you require. - R. _______________________________________________ general mailing list [EMAIL PROTECTED] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
