Re: [modwsgi] Re: Switching to mod_wsgi 4.0 to avoid listener backlog starvation

Graham Dumpleton Mon, 06 Feb 2012 02:33:31 -0800

On 6 February 2012 20:52, stefanoC <[email protected]> wrote:
> I'm back!
>
> I've got both good and bad news.
>
> The good: mod_wsgi 4.0 worked perfectly well.
>
> The bad: I still did not manage to fine tune the apache2+mod_wsgi
> +monitoring settings to a point where I was in control.
>
> I have an issue, definitely to be solved outside wsgi, of too many
> slow requests piping up at times.
> When it happens, there's a starvation effect that completely locks
> apache2 - it has to be restarted, or it won't serve any request
> anymore.

Yep. I am aware of the issue. So long as the requests threads don't
block completely, which is a different issue handling with
blocked-requests option, it isn't so much that it locks Apache up, but
that it causes an internal backlog of requests, which even when the
long requests finish the daemon processes will still process the
backlog even though the original user may have given up. In processing
the big backlog, because you then get a big influx of requests
together, you might again end up with a lot of longer requests all
coinciding again and so it starts over. Thus it can take a while for
things to stabalise, although if the resources of the systems as a
whole aren't sufficient, it could simply make the whole box grind to a
halt.

This can to a degree also happen when nginx is used as a front end as
well, as any multi hop solution will introduce these potential backlog
points solely due to the socket listen queue size for each socket.
Apache/mod_wsgi currently makes it a bit worse and easier for it to
trigger though.

For Apache/mod_wsgi you have the default (but configurable) listen
backlog of 100. So, if all processes/threads were busy, 100 more
requests could still queue up before clients start getting connection
refused. At the same time, you will have as many requests as you have
processes/threads in an accepted state and being handled within Apache
child worker processes themselves, or if using daemon mode, being
proxied to the daemon process.

Normally the number of Apache MPM processes/threads would be more than
mod_wsgi daemon process, but because the daemon processes also have a
100 listen backlog, again when all daemon process/threads busy, then
proxied requests will queue up internally and depending on how the
numbers work, it all acts as a big funnel with no way for things to
break out. In other words, if Apache MPM threads in total across all
processes is less than daemon threads +100, you will never get a
connection refused from daemon process.

Even if it was exceeded and you got a connection refused, the proxy
code for talking to the daemon process makes further attempts to
connect to the daemon process. This was done to cope with issues where
daemon processes not quite ready due to restarts or otherwise.

The problem here is the combination of the large daemon listen backlog
(which hasn't even been configurable until 4.0) as well as the retry
mechanism.

What I have started playing with, but never got a chance to finish
what I was doing, was to make the daemon listen backlog configurable,
or automatically adjust based on daemon and MPM config, but also
change when/how reconnect attempts are made.

The eventual aim was to introduce a way out of the funnel so you don't
get the backlogging problem and the issues it causes with daemon
processes getting overwhelmed and not being able to catch up again.

So, by one way or another, the aim is for 503 errors to be generated
when internal backlog occurs with the 503 going back to the client.
That way if the daemon processes get overwhelmed, then new requests
coming in will timeout and get thrown away. Yes it will mean users see
errors, but at least then you don't get a backlog and so when daemon
recovers, it has not got a pipe stuffed full of requests it will then
not be able to handle.

Graham

> The slow requests are database intensive, and can't be taken
> asynchronously easily. There are really few of them, but when more
> than a handfull happen together problems start.
>
> I had setup a monit to restart apache2 when it becomes irresponsive,
> but getting a correct timing for this is difficult, and I end up
> overcharging the machine more than anything by restarting slow
> processes that were doing ok.
> Also, at times of high traffic I could end with monit restarting
> apache2 several times in a row, and ending up abandoning leaving with
> a zombie apache2.
>
> Again I repeat this has nothing to do with mod_wsgi itself. I tried
> switching to gunicorn (I already had nginx as a frontend);
> performances itself are quite similar (I did not benchmark, but at
> least new relic was giving similar throughput).
> But it's so much easier to tune (the settings are few, and there isn't
> an apache2 to take care of) and monitor (with supervisord) that I'm
> sticking with this solution for the time being. Had I a better
> knowledge of Linux/Apache internals (or maybe if 4.0 was official with
> a precise doc for the news settings) I might have achieved the same
> results with mod_wsgi.
>
> Thanks Graham for your support, I definitely do not mean to invite
> others to move away from mod_wsgi that has been serving great until
> this problem arose, but I felt I should share my feedback. BTW, the
> apache2/mod_wsgi confs are still there and I can switch from one to
> another with a few line changes...
>
> Stefano
>
>
>
> On Dec 13 2011, 12:53 pm, stefanoC <[email protected]> wrote:
>> I'm building / installing from source on Ubuntu (old one!).
>>
>> I'm currently making some tests on the pre-production machine before
>> we jump on production, and the biggest headache is still the best
>> apache2 worker conf that will not overrun the machine.
>>
>> But I'll keep posted!
>>
>> On Dec 11, 7:54 pm, Rodrigo Campos <[email protected]> wrote:
>>
>>
>>
>> > On Tue, Dec 06, 2011 at 02:47:49AM -0800, stefanoC wrote:
>> > > Finally managed to jump from
>> > >http://serverfault.com/questions/335633/apachemod-wsgi-configuration-...
>> > > to this mailing list.
>>
>> > Interesting thread. We might be hitting some similar issue
>>
>> > > I am ready to try the switch, already compiled and run mod_wsgi 4.0 on
>> > > a test machine with the same config as usual.
>>
>> > We were about to try it, but some other stuff got in the way. We plan to 
>> > do it,
>> > but not sure when :(
>>
>> > But if you do it before us, please let us know how it went and how you did 
>> > it.
>> > Are you using debian/ubuntu ? Did you make a debian package ? I was 
>> > planning to
>> > make a debian package (we are using ubuntu) based on debian's package, 
>> > using
>> > uupdate (I think it should be that easy, although I didn't try to do it 
>> > yet)
>>
>> > Thanks,
>> > Rodrigo
>
> --
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/modwsgi?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] Re: Switching to mod_wsgi 4.0 to avoid listener backlog starvation

Reply via email to