Ok can you give us a list of instances that were rebooted so that I don't have to check one by one my instances if they rebooted or not? Thanks
On Sat, May 16, 2015 at 2:30 PM, Andrew Bogott <[email protected]> wrote: > This turns out to not have been a heating issue, or at least not entirely -- > it was some kind of kernel lockup. Coren and others rebooted the system and > restarted all instances, and things seem to be working fine now. We don't > have much explanation for what caused the problem, though, so we'll be on > the lookout. > > -A > > > On 5/15/15 11:31 PM, Andrew Bogott wrote: >> >> The hardware curse continues! >> >> One of the labs virt hosts (labvirt1003) is running very hot tonight, >> presumably due to a broken fan. It is intermittently scaling the CPU speed >> way back to avoid melting; when that happens there are bound to be lots of >> side-effects like unresponsive instances, clock drift, and the like (not >> least of which is that right now I can't ssh into the damn thing, or get >> performance metrics.) >> >> Naturally this started happening late on a Friday, so it may be a while >> before I can get someone in the datacenter. I'm leaving the host up in the >> meantime, based on the notion that half a server is better than none, but >> poor performance is likely to be the norm in the meantime. >> >> I did shut off one instance: wikidata-wdq-mm. I don't have a personal >> grudge, but it was gobbling CPU cycles and the system really needs a rest. >> If loss of that instance is a disaster for anyone, contact me and I'll see >> if I can revive it and shut off ten or so other instances to make room. >> >> Updates as events warrant! >> >> -Andrew > > > > _______________________________________________ > Labs-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/labs-l _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
