On Mon, 2008-09-22 at 16:17 -0400, Ms. Megan Larko wrote: > Hello All, > > I honestly do not know how it happened, but the value in > /proc/sys/lustre/timeout on the OSS box was set to 100. All other > systems were set to 1000.
FWIW, 1000 is waaaaay high. Our biggest production systems (thousands if not 10s of thousands) nodes don't use values higher than 300 seconds. You might want to try lowering that value to 300 seconds (on all nodes of course!) and see if you experience stability. You might want to experiment with even lower values (100s is default) and see where you can maintain stability. The downside of high obd_timeouts is long recovery times. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
