[Lustre-discuss] Too many client eviction

DEGREMONT Aurelien Tue, 03 May 2011 05:59:56 -0700

Hello

We often see some of our Lustre clients being evicted abusively (clients 
seem healthy).
The pattern is always the same:


All of this on Lustre 2.0, with adaptative timeout enabled

1 - A server complains about a client :
### lock callback timer expired... after 25315s...
(nothing on client)

(few seconds later)

2 - The client receives -107 to a obd_ping for this target
(server says "@@@processing error 107")

3 - Client realize its connection was lost.
Client notices it was evicted.
It reconnects.

(To be sure) When client is evicted, all undergoing I/O are lost, no 
recovery will be done for that?

We are thinking to increase timeout to give more time to clients to 
answer the ldlm revocation.
(maybe it is just too loaded)
- Is ldlm_timeout enough to do so?
- Do we need to also change obd_timeout in accordance? Is there a risk 
to trigger new timeouts if we just change ldlm_timeout (cascading timeout).

Any feedback in this area is welcomed.

Thank you

Aurélien Degrémont
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] Too many client eviction

Reply via email to