Reminder -- I'll be doing this tomorrow, starting about 24 hours from now.
-Andrew
On 9/16/14 10:45 AM, Andrew Bogott wrote:
-- Executive Summary:
Many instances will be rebooted at some point this weekend or next
week. The total list of instances subject to reboot is here:
https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
Tools and Beta users can ignore this email.
-- The full story:
Sorry about sending two different IMPORTANT emails this week; we
generally try to keep labs crises to a minimum. Indeed, this email is
about avoiding a potential crisis.
The labs server known as 'virt1006' has been acting poorly lately.
Several times in the last month we've seen instances that live on
virt1006 get into inconsistent states during reboot... they reboot and
never come back up, or they stay in a perpetual 'rebooting' state.
So far we've been able to rescue such instances, but the misbehavior
of a Labs server is very disconcerting. Rather than wait for a full
collapse (and resulting sudden death of 50+ VMs) we've decided to
migrate all instances instances off of virt1006 and then either
rebuild the system or discard the hardware. Moving an instance off of
a server is fairly painless, but it does require a few minutes of
downtime and a reboot.
I've spoken to a few of you directly about the reboots; the affected
Tools and Deployment-prep instances have already been handled. There
are a lot more to go, though. If your instance is stable and has its
init scripts set up properly and a reboot is no big deal, then,
congratulations! Otherwise, please take whatever steps you need to
take to batten down the hatches and get ready for a reboot.
If you need the reboot to happen at a scheduled time while you are
standing by, that's totally fine. In that case please schedule a
reboot window on this page:
https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
Thanks for your cooperation.
-Andrew
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l