Bleh. So, it turns out that in all likelihood, last night reboot of the NFS server has caused some corruption in the underlying filesystem that quickly started to degrade after 12:00 UTC to the point of unusability.
Attempts to repair the filesystem did not succeed and, in order to get usability back up, I've made a copy of its content from a slightly older snapshot (~1.5h) to a new filesystem and substituted it for the previous one. That maneuver restored functionality but may require a restart of the instances using NFS (the tools project has already been restarted for that purpose). We do not yet know exactly what caused the initial corruption, but the broken filesystem and its snapshits have been kept so that I can investigate it. In the meantime, the NFS server has been switched to have storage on ext4 so that if the issue is in interaction between XFS (the previous filesystem) and block storage, the issue should not recur. -- Marc _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
