On Mon, 2008-09-22 at 16:17 -0400, Ms. Megan Larko wrote:
> Hello All,
> 
> I honestly do not know how it happened, but the value in
> /proc/sys/lustre/timeout on the OSS box was set to 100.   All other
> systems were set to 1000.

FWIW, 1000 is waaaaay high.  Our biggest production systems (thousands
if not 10s of thousands) nodes don't use values higher than 300 seconds.
You might want to try lowering that value to 300 seconds (on all nodes
of course!) and see if you experience stability.  You might want to
experiment with even lower values (100s is default) and see where you
can maintain stability.  The downside of high obd_timeouts is long
recovery times.

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to