Reminder -- I'll be doing this tomorrow, starting about 24 hours from now.

-Andrew


On 9/16/14 10:45 AM, Andrew Bogott wrote:
-- Executive Summary:

Many instances will be rebooted at some point this weekend or next week. The total list of instances subject to reboot is here:

https://wikitech.wikimedia.org/wiki/Virt1006_rebuild

Tools and Beta users can ignore this email.


-- The full story:

Sorry about sending two different IMPORTANT emails this week; we generally try to keep labs crises to a minimum. Indeed, this email is about avoiding a potential crisis.

The labs server known as 'virt1006' has been acting poorly lately. Several times in the last month we've seen instances that live on virt1006 get into inconsistent states during reboot... they reboot and never come back up, or they stay in a perpetual 'rebooting' state.

So far we've been able to rescue such instances, but the misbehavior of a Labs server is very disconcerting. Rather than wait for a full collapse (and resulting sudden death of 50+ VMs) we've decided to migrate all instances instances off of virt1006 and then either rebuild the system or discard the hardware. Moving an instance off of a server is fairly painless, but it does require a few minutes of downtime and a reboot.

I've spoken to a few of you directly about the reboots; the affected Tools and Deployment-prep instances have already been handled. There are a lot more to go, though. If your instance is stable and has its init scripts set up properly and a reboot is no big deal, then, congratulations! Otherwise, please take whatever steps you need to take to batten down the hatches and get ready for a reboot.

If you need the reboot to happen at a scheduled time while you are standing by, that's totally fine. In that case please schedule a reboot window on this page:

https://wikitech.wikimedia.org/wiki/Virt1006_rebuild

Thanks for your cooperation.

-Andrew


_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to