So, 14 days after the previous reboot of labstore3, the same symptoms occurred again today around 18h UTC.
This time, however, we found a likely culprit from a known bug in the RPC scheduler[1] that paravoid confirmed by dumping a stack trace of the live system before we rebooted it. The patch was applied to the 3.8 kernel tree, so we upgraded labstore3 to linux-image-3.8.0-26-generic before rebooting. The NFS server is operational again at this time. Provided this fixes the bug (which is almost certain given the stack traces), we will return the tunables we had previously changed to try to isolate the problem to their more performing values in two weeks. Thank you all for your patience, -- Marc _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
