Re: 502 errors continue

Greg Gard Thu, 08 Oct 2009 18:57:03 -0700

hi all,

i have been watching this thread with some interest and had a comment
about mongrel and health checks. i have found that i can have mongrel
serve up a static file as a health check that uses a separate thread
from a regular rails request. i have tested this by making my rails
app sleep for a few minutes and watch haproxy for missed health
checks. none. so we use:


option httpchk HEAD /check.txt HTTP/1.0

in our backends so that mongrel will get a request for a static file
which is happily serves up concurrently with a regular rails request.
we use nginx in front of haproxy for regular static files, but the
health checks are mongrel. it is my understanding that mongrel serves
up static files if they exist and if not they hit rails routing and
the mutex lock on the thread that prevents concurrent rails requests
but not, i think, static files.

anyway this is our strategy and it seems to work well. would
appreciate any feedback around this as i am operating on empirical
testing and basic knowledge of rails code rather than any hardcore
understanding of haproxy or mongrel.

btw, if anyone is interested i have also written some ruby "drivers"
for haproxy that test database up/down state and do failover but not
failback (stonith). very primitive at this point but working.

....gg



On Thu, Oct 8, 2009 at 6:42 PM, Ben Fyvie <[email protected]> wrote:
> Thanks everyone for your input
>
>> Have you considered running passenger (http://www.modrails.com/)?  I don't
>> miss mongrel one bit.
>
> Ryan, we had planned on evaluating it after haproxy. It does seem this will
> resolve quite a number of our issues.
>
>> I think you are confusing monit and the script that monit runs when doing
>> a "mongrel stop" in response to a threshold exceeded condition.
>
> Hank, I probably am. I'm a PC guy in a Linux world ;) Our hosting provider
> gave me a quick summary of how they have configured monit for us, but I
> didn't dig into it myself so I probably am stretching out and making some
> assumptions that I shouldn't be.
>
>> You could change that shell script to do a kill -9 immediately rather than
>> being nice. But you'll still get a 50X if the mongrel was in the middle of
>> a request.
>
> Hank, that would make it quick and painless. It seems we will get the 502
> anyway so why not kill it sooner.
>
>> Another way, if you are doing such frequent health checks is to have the
>> "mongrel stop" script set a page/file that tells haproxy that the server
>> is now down (described in haproxy mailing list discussions) then wait a
>> couple seconds before issuing the first "nice" kill command - thus letting
>> hap stop sending request and allowing that last request to finish
>> (depending on how long your avg request response takes on your mongrels
>> when they are bloated - if it takes 10 seconds for mongrel to complete a
>> response when it is using a ton of memory then you are probably not going
>> to be able to get all the timing to work out).
>
> Hank, that is a very creative and excellent idea. That would reduce the
> frequency of our 502 errors dramatically.
>
>> haproxy regulates traffic between servers and ensures the queue is served
>> as fast as possible. This means that as soon as one connection is
>> released, another pending one is immediately served, just a few
>> microseconds later. It is fairly possible that mongrel needs to have some
>> idle time between requests to decide to soft-stop. This would be stupid,
>> but everything is possible with this server... The correct way of
>> performing a soft stop is to stop listening first, even if there's a
>> request being processed. But clearly I'm not sure mongrel does it right.
>
> Willy, ya it is possible it needs some breathing room between requests in
> order to soft stop. My guess is that mongrel is just being stubborn and will
> gladly take on more requests as long as they keep coming.
>
>> Why on earth is it required to constantly kill that server ??? Normally
>> this should only happen in extreme cases, may be a few times a year or
>> even a few times a month. In these situations, you wouldn't mind that much
>> about one or two 502 responses being sent.
>
> Willy, our app has bloat...major bloat. A mongrel memory footprint for an
> app such as ours should be using roughly 90-120mb of memory. Our mongrels
> level out their memory footprint right now at around 365mb and we have
> certain request that can cause it to get even larger, thus monit calls for a
> restart to reduce resource consumption. We know the bloat problem needs
> attention, but it will not be addressed for a few more months. For now we
> have added more memory and up'd the ceiling value that monit uses to
> determine when to restart a mongrel instance. This has reduced the amount of
> mongrel restarts from a few hundred a day to about 20 or less a day.
>
>> Since this server is limited to only one connection, as long as a request
>> is being served, an HTTP health check cannot be served. This means you
>> have to set a large enough "fall" parameter for your checks in order not
>> to detect a server as down while it's processing a request. Of course this
>> prevents haproxy from quickly detecting a soft stop condition.
>
> Willy, this is good to know...sounds like we should change the health check
> interval back to 5000ms.
>
>
> Well folks, it turns out that nginx was raising 502 errors as well prior to
> adding haproxy. It was just that nginx was configured to raise a 500.html
> page and not a 502.html page. So when we saw a 502 for the first time after
> installing haproxy we figured it was a new issue. It turns out mongrel has
> been the source of all of our problems. For the time being we have reduced
> the amount of restarts as much as possible as a stop gap while we begin
> investigating the use of Phusion Passenger in our environment.
>
> Thanks again for everyone's input. I posted my questions to the mongrel
> mailing list earlier this week and have not received any input from them.
> I'm glad the haproxy community was there to help troubleshoot the problem
> and I apologize that it turned out to be a problem with mongrel and not
> haproxy.
>
>
> Ben Fyvie
>
> -----Original Message-----
> From: Willy Tarreau [mailto:[email protected]]
> Sent: Wednesday, October 07, 2009 11:53 PM
> To: Ben Fyvie
> Cc: [email protected]
> Subject: Re: 502 errors continue
>
> On Wed, Oct 07, 2009 at 05:21:00PM -0500, Ben Fyvie wrote:
>> Thanks Willy,
>>
>> > it's clear that the server unexpectedly closed the connection or died
>>
>> You are absolutely correct. We've found that the problem is not to do with
>> malformed headers, so we don't need to debug that like I had previously
>> thought we might.
>
> OK
>
> (...)
>> After doing all of this we found that we were still getting 502 errors.
> One
>> of which was this:
>>
>> Oct 7 11:31:28 localhost haproxy[22110]: 127.0.0.1:36222
>> [07/Oct/2009:11:31:27.522] nightingalenotes_staging
>> nightingalenotes_staging/nightingalenotes_5000 0/0/0/-1/861 502 204 - -
> SL--
>> 0/0/0/0/0 0/0 "GET
>> /check_session?authenticity_token=fa175d0290443acf8ae9f2cb3d9104098facc3f7
>> HTTP/1.0"
>
> (...) correct analysis below
>
>> So what our problem really comes down to is why doesn't mongrel quietly
> stop
>> receiving requests after monit issues the initial "kill". (FYI - it is our
>> understanding that calling "mongrel stop" also issues a "kill" command so
>> there is no nicer way to ask it to shut down)
>>
>> The thing that really confuses us is that prior to putting haproxy into
>> place we didn't receive any problems when a mongrel instance was
> restarted.
>> This leads us to believe that there must be something that haproxy is
> doing
>> that prevents the mongrel instance from shutting down nicely when issued
> the
>> "kill" command. Does haproxy keep a constant connection with the server
>> (mongrel in our case)???
>
> No, but one thing is possible : haproxy regulates traffic between servers
> and ensures the queue is served as fast as possible. This means that as
> soon as one connection is released, another pending one is immediately
> served, just a few microseconds later. It is fairly possible that mongrel
> needs to have some idle time between requests to decide to soft-stop. This
> would be stupid, but everything is possible with this server... The correct
> way of performing a soft stop is to stop listening first, even if there's a
> request being processed. But clearly I'm not sure mongrel does it right.
> Another thing to investigate : why on earth is it required to constantly
> kill that server ??? Normally this should only happen in extreme cases,
> may be a few times a year or even a few times a month. In these situations,
> you wouldn't mind that much about one or two 502 responses being sent.
>
> Also another thing : since this server is limited to only one connection,
> as long as a request is being served, an HTTP health check cannot be served.
> This means you have to set a large enough "fall" parameter for your checks
> in order not to detect a server as down while it's processing a request. Of
> course this prevents haproxy from quickly detecting a soft stop condition.
>
> Regards,
> Willy
>
>
>
>
>



-- 
greg gard, psyd
www.carepaths.com

Re: 502 errors continue

Reply via email to