Rui Machado wrote:
2008/5/16 Roland Dreier <[EMAIL PROTECTED]>:
 > hmm..... and is there no workaround for this, for this situation? I
 > mean, if the server dies isn't there any possibility that
 > the sender/client realizes this. If the timeout it's too large this
 > can be cumbersome.
 >
 > I tried reducing the timeout and indeed the client realizes faster
 > when the server exits but another problem arises: Without exiting the
 > server,
 > on the client side I get the error (retry exceed) when polling for a
 > recently posted send - this after some hours.

There's a tradeoff between detecting real failures faster, and reducing
false errors detected because a response came too slowly.

Clearly if a response may take an amount of time 'X' to be received
under normal conditions, there's no way to conclude that the remote side
has failed without waiting at least 'X'.


I understand. So there's no really difference between the two
situations, real server failure or just a load problem that takes more
time?
From the sender QP point of view, they are the same (ack/nack wasn't send during a specific
period of time)
Something like a different error or a SIGPIPE :) ?

I will describe my situation, maybe it helps (bare with me as I'm
starting with Infiniband and so on)
I have a client and a server.The clients posts RDMA calls one at a
time (post, poll, post...). So server is just there.
If I try to start something like 16 clients on 1 machine, after a few
hours I will get an error on some client programs (retry excess) with
a timeout of 14. If I increase the timeout for 32, I don't see that
error but if I stop the server, the clients take a lot of time to
acknowledge that, which is also not wanted.
That's why I asked  if there a 'good value'. If I have such a load
between 2 nodes, I always have to risk that if the server dies the
client will take much time to see it. That's not nice!
Did you try to increase the retry_count too?
(and not only the timeout).

By the way, Which RDMA operation do you execute READ or WRITE?
Thanks for the help and quick answers,
You are always welcome ..
Dotan
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to