Hi,

>>
>> when setting the timeout in a struct ibv_qp_attr, this value
>> corresponds to the Local ACK timeout which according to the Infiniband
>> spec will define the transport timer timeout defined by the formula:
>> 4.096uS * 2 ^Local Ack timeout". Is this right?
>> And is there a value for this timeout to be considered "good practice"?
>>
> This value is depend on your fabric size, on the HCA you have (and some more 
> factors)..
>> Also, in a client-server setup, if this timeout is set to a "big
>> value" (like 30) when the server dies, the client will take that
>> amount of time to realize the failure. Is this correct?
>>
> Yes, after (at least) the calculated time * number of retry_count usec, the 
> sender QP will get a retry exceeded
> (if there was a SR which was posted without any response from the receiver).
>
hmm..... and is there no workaround for this, for this situation? I
mean, if the server dies isn't there any possibility that
the sender/client realizes this. If the timeout it's too large this
can be cumbersome.

I tried reducing the timeout and indeed the client realizes faster
when the server exits but another problem arises: Without exiting the
server,
on the client side I get the error (retry exceed) when polling for a
recently posted send - this after some hours.

Thank you for the help.


Rui
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to