Thanks everyone for your input

> Have you considered running passenger (http://www.modrails.com/)?  I don't
> miss mongrel one bit.

Ryan, we had planned on evaluating it after haproxy. It does seem this will
resolve quite a number of our issues.

> I think you are confusing monit and the script that monit runs when doing
> a "mongrel stop" in response to a threshold exceeded condition.

Hank, I probably am. I'm a PC guy in a Linux world ;) Our hosting provider
gave me a quick summary of how they have configured monit for us, but I
didn't dig into it myself so I probably am stretching out and making some
assumptions that I shouldn't be.

> You could change that shell script to do a kill -9 immediately rather than
> being nice. But you'll still get a 50X if the mongrel was in the middle of
> a request.

Hank, that would make it quick and painless. It seems we will get the 502
anyway so why not kill it sooner.

> Another way, if you are doing such frequent health checks is to have the
> "mongrel stop" script set a page/file that tells haproxy that the server
> is now down (described in haproxy mailing list discussions) then wait a
> couple seconds before issuing the first "nice" kill command - thus letting
> hap stop sending request and allowing that last request to finish
> (depending on how long your avg request response takes on your mongrels
> when they are bloated - if it takes 10 seconds for mongrel to complete a
> response when it is using a ton of memory then you are probably not going
> to be able to get all the timing to work out).

Hank, that is a very creative and excellent idea. That would reduce the
frequency of our 502 errors dramatically.

> haproxy regulates traffic between servers and ensures the queue is served
> as fast as possible. This means that as soon as one connection is
> released, another pending one is immediately served, just a few
> microseconds later. It is fairly possible that mongrel needs to have some
> idle time between requests to decide to soft-stop. This would be stupid,
> but everything is possible with this server... The correct way of
> performing a soft stop is to stop listening first, even if there's a
> request being processed. But clearly I'm not sure mongrel does it right.

Willy, ya it is possible it needs some breathing room between requests in
order to soft stop. My guess is that mongrel is just being stubborn and will
gladly take on more requests as long as they keep coming. 

> Why on earth is it required to constantly kill that server ??? Normally
> this should only happen in extreme cases, may be a few times a year or
> even a few times a month. In these situations, you wouldn't mind that much
> about one or two 502 responses being sent.

Willy, our app has bloat...major bloat. A mongrel memory footprint for an
app such as ours should be using roughly 90-120mb of memory. Our mongrels
level out their memory footprint right now at around 365mb and we have
certain request that can cause it to get even larger, thus monit calls for a
restart to reduce resource consumption. We know the bloat problem needs
attention, but it will not be addressed for a few more months. For now we
have added more memory and up'd the ceiling value that monit uses to
determine when to restart a mongrel instance. This has reduced the amount of
mongrel restarts from a few hundred a day to about 20 or less a day.

> Since this server is limited to only one connection, as long as a request
> is being served, an HTTP health check cannot be served. This means you
> have to set a large enough "fall" parameter for your checks in order not
> to detect a server as down while it's processing a request. Of course this
> prevents haproxy from quickly detecting a soft stop condition.

Willy, this is good to know...sounds like we should change the health check
interval back to 5000ms.


Well folks, it turns out that nginx was raising 502 errors as well prior to
adding haproxy. It was just that nginx was configured to raise a 500.html
page and not a 502.html page. So when we saw a 502 for the first time after
installing haproxy we figured it was a new issue. It turns out mongrel has
been the source of all of our problems. For the time being we have reduced
the amount of restarts as much as possible as a stop gap while we begin
investigating the use of Phusion Passenger in our environment.

Thanks again for everyone's input. I posted my questions to the mongrel
mailing list earlier this week and have not received any input from them.
I'm glad the haproxy community was there to help troubleshoot the problem
and I apologize that it turned out to be a problem with mongrel and not
haproxy. 


Ben Fyvie

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]] 
Sent: Wednesday, October 07, 2009 11:53 PM
To: Ben Fyvie
Cc: [email protected]
Subject: Re: 502 errors continue

On Wed, Oct 07, 2009 at 05:21:00PM -0500, Ben Fyvie wrote:
> Thanks Willy,
> 
> > it's clear that the server unexpectedly closed the connection or died
> 
> You are absolutely correct. We've found that the problem is not to do with
> malformed headers, so we don't need to debug that like I had previously
> thought we might.

OK

(...)
> After doing all of this we found that we were still getting 502 errors.
One
> of which was this:
> 
> Oct 7 11:31:28 localhost haproxy[22110]: 127.0.0.1:36222
> [07/Oct/2009:11:31:27.522] nightingalenotes_staging
> nightingalenotes_staging/nightingalenotes_5000 0/0/0/-1/861 502 204 - -
SL--
> 0/0/0/0/0 0/0 "GET
> /check_session?authenticity_token=fa175d0290443acf8ae9f2cb3d9104098facc3f7
> HTTP/1.0"

(...) correct analysis below

> So what our problem really comes down to is why doesn't mongrel quietly
stop
> receiving requests after monit issues the initial "kill". (FYI - it is our
> understanding that calling "mongrel stop" also issues a "kill" command so
> there is no nicer way to ask it to shut down)
> 
> The thing that really confuses us is that prior to putting haproxy into
> place we didn't receive any problems when a mongrel instance was
restarted.
> This leads us to believe that there must be something that haproxy is
doing
> that prevents the mongrel instance from shutting down nicely when issued
the
> "kill" command. Does haproxy keep a constant connection with the server
> (mongrel in our case)???

No, but one thing is possible : haproxy regulates traffic between servers
and ensures the queue is served as fast as possible. This means that as
soon as one connection is released, another pending one is immediately
served, just a few microseconds later. It is fairly possible that mongrel
needs to have some idle time between requests to decide to soft-stop. This
would be stupid, but everything is possible with this server... The correct
way of performing a soft stop is to stop listening first, even if there's a
request being processed. But clearly I'm not sure mongrel does it right.
Another thing to investigate : why on earth is it required to constantly
kill that server ??? Normally this should only happen in extreme cases,
may be a few times a year or even a few times a month. In these situations,
you wouldn't mind that much about one or two 502 responses being sent.

Also another thing : since this server is limited to only one connection,
as long as a request is being served, an HTTP health check cannot be served.
This means you have to set a large enough "fall" parameter for your checks
in order not to detect a server as down while it's processing a request. Of
course this prevents haproxy from quickly detecting a soft stop condition.

Regards,
Willy




Reply via email to