Hi, >> >> when setting the timeout in a struct ibv_qp_attr, this value >> corresponds to the Local ACK timeout which according to the Infiniband >> spec will define the transport timer timeout defined by the formula: >> 4.096uS * 2 ^Local Ack timeout". Is this right? >> And is there a value for this timeout to be considered "good practice"? >> > This value is depend on your fabric size, on the HCA you have (and some more > factors).. >> Also, in a client-server setup, if this timeout is set to a "big >> value" (like 30) when the server dies, the client will take that >> amount of time to realize the failure. Is this correct? >> > Yes, after (at least) the calculated time * number of retry_count usec, the > sender QP will get a retry exceeded > (if there was a SR which was posted without any response from the receiver). > hmm..... and is there no workaround for this, for this situation? I mean, if the server dies isn't there any possibility that the sender/client realizes this. If the timeout it's too large this can be cumbersome.
I tried reducing the timeout and indeed the client realizes faster when the server exits but another problem arises: Without exiting the server, on the client side I get the error (retry exceed) when polling for a recently posted send - this after some hours. Thank you for the help. Rui _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
