On Wed, 2008-08-06 at 10:41 -0600, Chris Worley wrote: > > Is there anything in /proc or /sys I can look at to see whatever > "keepalive" parameters are setup?
All timeouts are based on the obd_timeout in /proc/sys/lustre/timeout which MUST be the same on all nodes. > The systems aren't dying. They are failing to communicate with the MDS for some reason. Network problems perhaps? You could try enabling +rpctrace debug and inspecting the debug file for RPCs to see if the client is indeed sending something (even if it's a ping) at regular intervals. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
