So, a status update: Labs NFS is now on new hardware, and functions properly without the disk controller stalls that plagued the previous server. (yeay!)
On the down side, because we were not completely sure whether the problem was caused by the controller being faulty or a regression in the driver, the new install was (purposefully) very paranoid and downgraded the kernel to 3.2, removing a few features as a side effect and causing one unanticipated problem: change of file ownership no longer works properly, even for root[1], meaning that any new tool account requires a manual intervention and take no longer works. That problem can be fixed in two ways; either we upgrade the kernel back to the version that has proper support for our setup or we make a change in the way service groups are setup which we have been intending to do for a while. The former is a low-impact change that does not require rebooting any instances, and is probably going to be the first thing tried. With a bit of luck on our side, that'll fix the issue with no disruption. The change in service group setup is on our roadmap /anyways/ since that will fix a number of (mostly invisible to labs) limitations and problems in our infrastructure; but if we are able to we will wait until we move labs to our primary data center to do it so as to minimize disruption. More news to come, -- Marc [1] For the curious, because usernames and user ID do not match between projects, we have to use UID-based security with NFS4 rather than the default principal-based one, something that kernel versions before 3.5 only partially supported. It works, but fails to recognize UID 0 as superuser. _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
