I wound up needing to reboot labvirt1003. It's up now and seems happy; I'm currently in the process of restarting all associated instances. Everything should be up and running within the hour... let me know if you still see issues later in the day.

-Andrew

On 5/20/16 10:10 AM, Andrew Bogott wrote:
Note:  Tools users can ignore this message

We are seeing some unusual behavior on labvirt1003, which hosts a large number of labs instances. The problem is not yet diagnosed, but it is likely a hardware problem that will require reboots or downtime. Here is a complete list of labs instances currently living on labvirt1003:

https://phabricator.wikimedia.org/P3159

If you have any hosts on that box that cannot survive a reboot, please either let me know, or take steps to minimize the damage. I've removed labvirt1003 from the scheduler, so if you want to build a new instance and migrate services to it you can be assured that the new instance will be isolated from the coming chaos.

A simple reboot shouldn't produce more than 5-10 minutes of downtime. If a major outage seems likely, I'll follow up with additional warning.

-Andrew



_______________________________________________
Labs-announce mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-announce
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Reply via email to