[Labs-l] Outage report

Ryan Lane Fri, 01 Jun 2012 03:24:47 -0700

We're currently having a Labs outage. The nfs server because
non-responsive, causing a cascading failure. I'm suspending instances
currently, until load comes down. Once load is under control I'll
slowly resume instances. Soon, we'll be doing the following things to
ensure this doesn't continue to occur:


1. We're moving away from glusterfs to local storage on the virtual
nodes until we find another more appropriate solution
2. We're getting rid of the labs-nfs1 instance, and will move the home
directories to project storage
3. We're adding more (and better) hardware, that will lead to less
swapping, which will lead to less IO

Sorry about the experience as of late, I'm looking forward to
improving the situation for us.

- Ryan

_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

[Labs-l] Outage report

Reply via email to