[Labs-l] Filesystem maintenance aborted

Marc A. Pelletier Thu, 15 Jan 2015 12:09:37 -0800

Hello Labs,

The maintenance, due today, was started then aborted after two hourssince only roughly 2% of the necessary copy was done after that interval- which might have caused the partial outage to last well over four days.

The unexpected lack of performance was caused by the fact that labsstorage does not currently have sufficient elbow room to make aduplicate of the data over a contiguous area of the disk array - causingperformance much lower than that was observed during testing.

We have a new storage shelf on order that should be put in productionfairly soon (weeks); rather than add the storage this providesimmediately, I'll be able to use it to make an offline copy of the Labsstorage /prior/ to the next attempt at switching the filesystems over tothe new scheme - which I will schedule some time in the future.

The existing filesystem behaved as expected and was properly readonlyduring the two hours of partial outage, and has now been restored tofull read-write.

In the meantime, there should be no lasting effect from the partialoutage - in particular, the notes about existing open files becomingstale is not applicable since the filesystem was not switched. No toolor service that was not otherwise affected by the readonly filesystemneeds to be restarted.


-- Marc

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

[Labs-l] Filesystem maintenance aborted

Reply via email to