On Mon, Jul 29, 2013 at 9:29 PM, Jason Dagit <[email protected]> wrote:
>
> On Jul 29, 2013, at 7:02 PM, Austin Seipp <[email protected]> wrote:
>
>> I went ahead and upgraded the RAM on the WWW server because it only
>> took about 30 seconds and was very simple. I'll look into getting swap
>> enabled later tonight.
>
> Thanks for the updates. I'm glad to hear it was an easy fix.
>
> Couple of questions/observations:
>   * Do you know what kind of request would cause it to run out of memory? 
> Eg., did someone try to upload a large image, did it just get too many 
> concurrent requests, etc?

I'm unsure at this time I'm afraid. I'm looking into getting some
better monitoring infrastructure on these machines so we can more
easily correlate outages like this with their potential causes.

>   * In my experience, swap on a server tends to cause timeouts instead of 
> actual failure. A killed process tends to be easier to detect/recover from. 
> It may be better to leave swap disabled.

I imagine this is why it wasn't enabled in the first place. I haven't
done this yet but I'll hold off in case anyone elsewhere has some
opinions.

>   * Is it possible to have a wrapper around the mysqld process that restarts 
> it when it fails (and emails an admin)?

Yes, along with some system monitoring, I'd like to have a service
manager that automatically does this (daemontools/god/angel are all
possibilities.) There are several solutions available here, we just
need to hash something out (or worst case, I can just do it, but I
hate performing open heart surgery without some approval.)

> Anyway, thanks for the quick response!
>
> Jason
>
>>
>> On Mon, Jul 29, 2013 at 8:27 PM, Austin Seipp <[email protected]> wrote:
>>> As an update, the WWW server is severely limited on RAM and the cause
>>> of this problem was that the OOM killer hit the database. We're using
>>> InnoDB of course, so the crash should be safe. We also noticed that
>>> the machine has no swap partition.
>>>
>>> I'll be taking new-www and adding RAM and swap to it, hopefully by the
>>> end of the night.
>>>
>>> I'll post to this list for any expected downtimes. I will probably
>>> double the RAM first for good measure and add swap after.
>>>
>>> On Mon, Jul 29, 2013 at 8:07 PM, Austin Seipp <[email protected]> wrote:
>>>> This is now fixed. The MySQL instance went down for some reason and I
>>>> kicked it. I'll investigate it more.
>>>>
>>>> On Mon, Jul 29, 2013 at 8:01 PM, Jason Dagit <[email protected]> wrote:
>>>>> Hello,
>>>>>
>>>>> It looks like the haskell wiki is down.
>>>>>
>>>>> http://www.haskell.org/haskellwiki/Haskell
>>>>>
>>>>> Gives this page:
>>>>> Sorry! This site is experiencing technical difficulties.
>>>>>
>>>>> Try waiting a few minutes and reloading.
>>>>>
>>>>> (Can't contact the database server: Can't connect to local MySQL server
>>>>> through socket '/var/run/mysqld/mysqld.sock' (2) (localhost))
>>>>>
>>>>> You can try searching via Google in the meantime.
>>>>> Note that their indexes of our content may be out of date.
>>>>>
>>>>>
>>>>> Any idea what's wrong?
>>>>>
>>>>> Jason
>>>>>
>>>>> _______________________________________________
>>>>> haskell-infrastructure mailing list
>>>>> [email protected]
>>>>> http://community.galois.com/mailman/listinfo/haskell-infrastructure
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Austin - PGP: 4096R/0x91384671
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Austin - PGP: 4096R/0x91384671
>>
>>
>>
>> --
>> Regards,
>> Austin - PGP: 4096R/0x91384671
>



-- 
Regards,
Austin - PGP: 4096R/0x91384671
_______________________________________________
haskell-infrastructure mailing list
[email protected]
http://community.galois.com/mailman/listinfo/haskell-infrastructure

Reply via email to