On 11 May 2010 08:50, Alec Flett <[email protected]> wrote:
>
> On May 10, 2010, at 3:37 PM, Graham Dumpleton wrote:
>
>>
>> This is logical given that you have configured Apache server child
>> processes to be able to accept more requests than can be funnelled
>> into the mod_wsgi daemon processes. The Apache server child processes
>> will be blocked on the connect() call to the mod_wsgi daemon processes
>> waiting for them to call accept() on the listener socket. When it
>> finally connects it will straight away start trying to proxy data from
>> client across to daemon process. If client had disconnected, only when
>> it attempts to read data from client will it know that. This is
>> because Apache uses blocking threads and not an event driven
>> select/poll loop.
>
> Am I right to assume that for a GET,

Most of the time GET requests have no body.

> the entire request would have been consumed by apache while it was waiting 
> for a mod_wsgi daemon to become available,

Apache would have consumed the requests headers only prior to
attempting to connect to the mod_wsgi daemon process. It is only after
the connect succeeds that an attempt is made to read the body and
proxy it through to the mod_wsgi daemon process. That is at least the
case for HTTP. Things are potentially complicated in HTTPS and I don't
know what would happen.

> so that when it gets to mod_wsgi, the entire request looks intact to the 
> Python handler?

Only for a GET with no body. The body may or may not be complete
depending on the size and how much the client had already written and
which was able to be buffered in the network connection buffers before
client connection got dropped.

> Is there any way that mod_wsgi could detect that the client has disconnected, 
> so that it could avoid actually trying to run the request?

Not really.

I should point out that how things work is pretty well the same as
what mod_proxy, FASTCGI and SCGI solutions use.

> We're mostly trying to allow an upstream proxy to throttle its load to the 
> appserver by timing out requests that have been sitting around a while.
>
> it seems like we really should be setting MaxClients == # of daemons, and 
> letting the http requests pile up in the kernel, rather than letting apache 
> have a crack at it. Does this seem reasonable?

If threads for each mod_wsgi daemon process is 1 only then that is the
logical maximum capacity. In practice though it would be extreme case
if you achieved that. Especially so if Apache also handling static
files and if you have KeepAlive enabled as both the later will consume
Apache server threads. Thus, having a one to one correspondence may be
excessive and just wasting resources in mod_wsgi daemon process. The
ratio thus has to be chosen so you have enough to handle what requests
are likely to get through to daemon processes. Unfortunately, no
inbuilt statistics generation for working that out, although some WSGI
middleware examples were worked on previously in discussions on list
to work out how many threads in daemon processes were actually getting
utilised.

> On a related note, and mostly out of curiosity, does that mean that 
> mod_wsgi/apache is essentially leaving it up to the kernel to decide the 
> order to process the requests (since everyone is blocked on a connect())  - 
> and is that a reliable way to ensure that the first requests are processed 
> more or less in the order they arrived? (I'm fully expecting the answer to 
> this to be "well duh of course, that's how all servers work...")

A daemon process group uses a single socket to accept connection
across all processes, just like Apache only has one listener socket
for port 80. Thus, there is no internal load balancer and so yes you
are at mercy at kernel as to which is next request let through the
accept() call. If you have problems with that though, you are going to
have same problem with Apache and port 80. I would assume that a
kernel would track pending connects in order of arrival.

FWIW, these sorts of issues are why putting nginx proxy in front of
Apache/mod_wsgi may be a good idea. This is because nginx will buffer
request body up to a set value (default 1MB from memory), before
actually proxying request to Apache. This way nginx acts as a buffer
for slow clients and allows Apache to only be involved when it is
known that in most cases all of the request, headers and body, is
available and so Apache will itself not have to wait and so can act on
request immediately. Overall this means that the limited set of
blocking threads in Apache are better utilised and Apache can do more
with less. Using nginx also means you can offload static files to it.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Reply via email to