Re: [Labs-l] Outage report

Ryan Lane Fri, 01 Jun 2012 06:29:34 -0700

Yes. All instances were rebooted. Everything should be working now.

On Fri, Jun 1, 2012 at 2:45 PM, Shujen Chang <[email protected]> wrote:
> all instances? is it ok now?
>
>
> On Friday, June 1, 2012, Ryan Lane wrote:
>>
>> I'm now going to reboot the instances, since it'll bring the swapping
>> down for a while.
>>
>> On Fri, Jun 1, 2012 at 12:24 PM, Ryan Lane <[email protected]> wrote:
>> > We're currently having a Labs outage. The nfs server because
>> > non-responsive, causing a cascading failure. I'm suspending instances
>> > currently, until load comes down. Once load is under control I'll
>> > slowly resume instances. Soon, we'll be doing the following things to
>> > ensure this doesn't continue to occur:
>> >
>> > 1. We're moving away from glusterfs to local storage on the virtual
>> > nodes until we find another more appropriate solution
>> > 2. We're getting rid of the labs-nfs1 instance, and will move the home
>> > directories to project storage
>> > 3. We're adding more (and better) hardware, that will lead to less
>> > swapping, which will lead to less IO
>> >
>> > Sorry about the experience as of late, I'm looking forward to
>> > improving the situation for us.
>> >
>> > - Ryan
>>
>> _______________________________________________
>> Labs-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
>
> --
> Sincerely,
> Shujen Chang
>
>
> _______________________________________________
> Labs-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>


_______________________________________________
Labs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/labs-l

Re: [Labs-l] Outage report

Reply via email to