On Wed, 2008-08-06 at 10:41 -0600, Chris Worley wrote:
> 
> Is there anything in /proc or /sys I can look at to see whatever
> "keepalive" parameters are setup?

All timeouts are based on the obd_timeout in /proc/sys/lustre/timeout
which MUST be the same on all nodes.

> The systems aren't dying.

They are failing to communicate with the MDS for some reason.  Network
problems perhaps?  You could try enabling +rpctrace debug and inspecting
the debug file for RPCs to see if the client is indeed sending something
(even if it's a ping) at regular intervals.

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to