Hello Labs,

The maintenance, due today, was started then aborted after two hours since only roughly 2% of the necessary copy was done after that interval - which might have caused the partial outage to last well over four days.

The unexpected lack of performance was caused by the fact that labs storage does not currently have sufficient elbow room to make a duplicate of the data over a contiguous area of the disk array - causing performance much lower than that was observed during testing.

We have a new storage shelf on order that should be put in production fairly soon (weeks); rather than add the storage this provides immediately, I'll be able to use it to make an offline copy of the Labs storage /prior/ to the next attempt at switching the filesystems over to the new scheme - which I will schedule some time in the future.

The existing filesystem behaved as expected and was properly readonly during the two hours of partial outage, and has now been restored to full read-write.

In the meantime, there should be no lasting effect from the partial outage - in particular, the notes about existing open files becoming stale is not applicable since the filesystem was not switched. No tool or service that was not otherwise affected by the readonly filesystem needs to be restarted.

-- Marc

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to