On Wed, Aug 6, 2008 at 10:45 AM, Brian J. Murrell <[EMAIL PROTECTED]> wrote: > On Wed, 2008-08-06 at 10:41 -0600, Chris Worley wrote: >> >> Is there anything in /proc or /sys I can look at to see whatever >> "keepalive" parameters are setup? > > All timeouts are based on the obd_timeout in /proc/sys/lustre/timeout > which MUST be the same on all nodes. >
Would you suggest I increase or decrease this value? Is there a way to inhibit the eviction, or is that necessary to keep really dead clients from locking-out files. >> The systems aren't dying. > > They are failing to communicate with the MDS for some reason. Network > problems perhaps? You could try enabling +rpctrace debug and inspecting > the debug file for RPCs to see if the client is indeed sending something > (even if it's a ping) at regular intervals. All the systems (RHEL4 and 5 clients, Lustre servers) are on the same ethernet and IB switches. There were no issues before the 1.6.5.1 upgrade with the RHEL5 nodes. Would a normal ping do it? I can jury-rig all the RHEL5 nodes to ping the MDS. Chris _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
