On 5/16/15 7:59 AM, Petr Bena wrote:
Ok can you give us a list of instances that were rebooted so that I
don't have to check one by one my instances if they rebooted or not?
Thanks
Yes! I'm still writing the documentation, but there's a list of
affected instances at the bottom of the outage report here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150515-LabsOutage
On Sat, May 16, 2015 at 2:30 PM, Andrew Bogott <[email protected]> wrote:
This turns out to not have been a heating issue, or at least not entirely --
it was some kind of kernel lockup. Coren and others rebooted the system and
restarted all instances, and things seem to be working fine now. We don't
have much explanation for what caused the problem, though, so we'll be on
the lookout.
-A
On 5/15/15 11:31 PM, Andrew Bogott wrote:
The hardware curse continues!
One of the labs virt hosts (labvirt1003) is running very hot tonight,
presumably due to a broken fan. It is intermittently scaling the CPU speed
way back to avoid melting; when that happens there are bound to be lots of
side-effects like unresponsive instances, clock drift, and the like (not
least of which is that right now I can't ssh into the damn thing, or get
performance metrics.)
Naturally this started happening late on a Friday, so it may be a while
before I can get someone in the datacenter. I'm leaving the host up in the
meantime, based on the notion that half a server is better than none, but
poor performance is likely to be the norm in the meantime.
I did shut off one instance: wikidata-wdq-mm. I don't have a personal
grudge, but it was gobbling CPU cycles and the system really needs a rest.
If loss of that instance is a disaster for anyone, contact me and I'll see
if I can revive it and shut off ten or so other instances to make room.
Updates as events warrant!
-Andrew
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l
_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l