Hello All, I honestly do not know how it happened, but the value in /proc/sys/lustre/timeout on the OSS box was set to 100. All other systems were set to 1000. I changed the value on the OSS to 1000 and every error message on all of the related systems stopped. I got the idea to re-check from an e-mail message sent by Brian Murrell archived on os-dir referring to bug 16237. Brian listed the above as another thing to check.
Interestingly enough, the readahead (blockdev --report /dev/sdX) on the same OSS was set to 672. I have no idea where that came from either. All of the other systems have a reported readahead value of 256. I had changed the readahead value on OSS box first (blockdev --setra 256 /dev/sdX). The error messages did not stop until I fixed the value in /proc/sys/lustre/timeout. How could my /proc have such odd values in it? I will see if the change holds for now. I may have to do something to make it persistent for future reboots. Cheers! megan _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
