So, 14 days after the previous reboot of labstore3, the same symptoms
occurred again today around 18h UTC.

This time, however, we found a likely culprit from a known bug in the
RPC scheduler[1] that paravoid confirmed by dumping a stack trace of the
live system before we rebooted it.  The patch was applied to the 3.8
kernel tree, so we upgraded labstore3 to linux-image-3.8.0-26-generic
before rebooting.

The NFS server is operational again at this time.

Provided this fixes the bug (which is almost certain given the stack
traces), we will return the tunables we had previously changed to try to
isolate the problem to their more performing values in two weeks.

Thank you all for your patience,

-- Marc

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to