Re: [ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?

Roland Dreier Wed, 11 Apr 2007 20:48:22 -0700

 > Yes, Internally in A, if the # of receives exceeds lowwater(4), an ack
 > will be sent back. I assume ACK is not trigered at the moment.
 > when A is trying to receive a message from B, and the message never
 > shows, A acctualy sends a heart beat back to B, however, it takes
 > serveral seconds for this heart-beat to complete with error ( we
 > configure timout ~1 sec, and retry count 7).
 > 
 > Serveral seconds to detect connection failure is not acceptable for us,
 > so if I use rdmacm, I want to know if I detect the connection
 > failure faster than heart-beat message.


I think there is an internal contradiction in what you're doing here.
If your (ACK timeout) * (retry count) exceeds the time that you
consider acceptable to detect a failure, then you've set your
connection up wrong.  It's not even meaningful to talk about a
connection failing faster than this amount of time -- a connection
will recover from a transient network failure that resolves itself
before the last retry fails, and without a time machine it's
impossible to say whether a network failure will or will not be
resolved 7 seconds into the future.

Certainly if you receive a disconnect request, then you know the
remote side is really and truly gone.  But if you've set your
timeouts/retry counts so that connections will take 7 seconds to
fail after an event like a link going down, then there's no way to
detect that failure before it occurs.

It seems to me the solution is to reduce your timeout and/or retry
count so that connections fail within the time scale that you require.

 - R.
_______________________________________________
general mailing list
[EMAIL PROTECTED]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] RE: How fast to get RDMA_CM_EVENT_DISCONNECTED ?

Reply via email to