Re: [Labs-l] Outage report

Ryan Lane Fri, 01 Jun 2012 03:40:12 -0700

I'm now going to reboot the instances, since it'll bring the swapping
down for a while.


On Fri, Jun 1, 2012 at 12:24 PM, Ryan Lane <[email protected]> wrote:
> We're currently having a Labs outage. The nfs server because
> non-responsive, causing a cascading failure. I'm suspending instances
> currently, until load comes down. Once load is under control I'll
> slowly resume instances. Soon, we'll be doing the following things to
> ensure this doesn't continue to occur:
>
> 1. We're moving away from glusterfs to local storage on the virtual
> nodes until we find another more appropriate solution
> 2. We're getting rid of the labs-nfs1 instance, and will move the home
> directories to project storage
> 3. We're adding more (and better) hardware, that will lead to less
> swapping, which will lead to less IO
>
> Sorry about the experience as of late, I'm looking forward to
> improving the situation for us.
>
> - Ryan

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Re: [Labs-l] Outage report

Reply via email to