Re: [Cherokee] mod_wsgi for cherokee?

Graham Dumpleton Tue, 26 Aug 2008 17:20:57 -0700

On Aug 26, 10:57 pm, Alvaro Lopez Ortega <[EMAIL PROTECTED]> wrote:
> >> So, despite what you suggest, if a bare minimum interpreter is twice as
> >> big as the web server, I wouldn't personally call it "small". IMO Python
> >> rocks anyway, but calling it small may be too much.
>
> > But to put that if further context, it is not uncommon for a process
> > holding a Python web framework instance to be anywhere between 40 and
> > 100MB for a typical site.
>
> You have just raised my point. You do NOT want your web server process
> to weight 100Mb if it can weight 1.3Mb!
>
> It might be my perception, but having such a big web server in memory
> without a really REALLY good reason sounds mad. And the point is that
> there is not such good reason but a few problems.


As I pointed out previously, for a large site like that they wouldn't
serve static media in the same web server but in a separate web server
instance such as nginx or lighttpd. Thus, the only reason the web
server component exists in this case is to feed requests into the
dynamic Python web application.

So except for purist ideals it makes no difference if you embed Python
in the web server as with Apache/mod_wsgi in embedded mode, or whether
you embed the web server in the Python application, such as with using
Paste HTTP server. In both cases for an equivalent configuration the
amount of overall memory used isn't going to be much different.

When using FASTCGI and SCGI things aren't much different really as in
a WSGI application it is the Python code which is doing all the URL
dispatch. In other words, any ability of the web server to dispatch
based on the URL isn't used and it just funnels all requests below the
application mount point through to the FASTCGI/SCGI process. In the
process you have interjected the need to perform a transformation from
HTTP to FASTCGI/SCGI to WSGI, where as in cases above one is more or
less going direct from HTTP to WSGI.

> Having Python plus the application logic executing under another PID is
> much cleaner form the architectural point of view, and as a matter of
> fact, more secure.  For instance, if the interpreter crashes because a
> bug in one of the dozens of libraries that it needs to load, I want my
> web server to continue serving cached and static content until a new
> application server is spawned and ready to use.

Since Apache is a multi process web server if one child process
crashes then there are other processes there still handling requests
while the child process is replaced. The only configuration where
concurrent requests may be affected is if using multithread worker
MPM. In this case any concurrent requests in the same process that
crashed would be affected. For large high performance sites, it would
always be preferable that one uses lots of physical memory and use
single threaded prefork MPM. In that case only that one request that
caused the crash would be affected.

This is a reliability issue and nothing to do with security. In
mod_wsgi daemon mode one can still run processes as a different user
ID in a different process. It does not use suEXEC and process is a
fork of Apache parent so technically one can dig into Apache data
structures inherited from parent process. In other words, not as
secure as suEXEC, but if you control your own site, or trust the
users, then acceptable. If you are wanting to set up a hostile
environment like shared hosting for unknown users, then yes suEXEC and
FASTCGI/SCGI would be preferable and not arguing it wouldn't be.

> What would happen if the interpreter called a blocking third party
> function?  Depending on the web server architecture it would either
> starve or trigger a -let's call it- emergency mechanism to take care of
> the situation (which does not sound clean nor reliable at all).

What happens in that case depends on the configuration and for how
long it blocked. For Apache prefork and mod_wsgi in embedded mode,
because it is single threaded nothing else is affected. For Apache
worker MPM and mod_wsgi in embedded mode, then yes locking of the GIL
and not releasing it would cause all concurrent Python requests in the
same process to wait. In both cases though, because Apache is
multiprocess, other process can still handle requests.

Finally, if using mod_wsgi in daemon mode, if such buggy code existed
and didn't release the GIL before going into a blocking state, or the
code simply managed to deadlock itself, then a feature of mod_wsgi
comes to the rescue. What this feature implements is a mechanism which
detects a dead lock situation in use of the Python GIL or a request
handler that has held lock for too long and the process will be
forcibly killed off. Because Apache/mod_wsgi handles process
management, that process will be automatically restarted. So, for
daemon mode of mod_wsgi at least, it protects itself from such
conditions and ensures that operation of the application can continue
and things not hang.

The thing is though that any code that doesn't release the GIL when
going into a blocking call is flawed and not implemented correctly as
C extension modules for Python are supposed to release the GIL. You
therefore are trying to argue that because someone can write buggy
software when they shouldn't be that the whole concept is bad.

Also, this condition can still occur when using FASTCGI and SCGI. In
that situation it is actually worse as there is no deadlock detection
mechanism like mod_wsgi has and so the FASTCGI/SCGI process just hangs
until a user manually kills it and restarts it.

> Besides, running the same WSGI application as a SCGI app-server would,
> in the worst scenario, be as fast as running it in the embedded
> interpreter (in fact, I bet it is faster).

If you compare FASTCGI/SCGI on Apache with both mod_wsgi embedded and
mod_wsgi daemon for a simple hello world application, thus effectively
testing just the web serving mechanism, the FASTCGI/SCGI
implementations are slower.

If you compare Apache/mod_wsgi to FASTCGI/SCGI running under a
different web server you may well get different results, but because
FASTCGI/SCGI require an extra hop and due to the overhead of 'flup'
needed to then bridge to WSGI in the backend process you may find it
isn't as good as you think.

As pointed out before though, all well and good for a hello world
application, but in real world applications the network isn't the
bottleneck and performance differences at this level are swallowed up
by those other overheads.

> So, I still understand that there is a basic issue in the model. There
> is no clear win for the couple of problems that are introduced (security
> and architecture).

True, they are simply different ways of doing things and as mentioned
before it often comes down to what people find easier to configure and
use. As much as we argue about embedded versus FASTCGI/SCGI like
solutions, other people such as the Pylons community would outright
refuse to use either. Their religion is that running Paste HTTP server
inside of the Python application behind mod_proxy is the only true
path. So, choice is good. :-)

Graham
_______________________________________________
Cherokee mailing list
[email protected]
http://lists.octality.com/listinfo/cherokee

Re: [Cherokee] mod_wsgi for cherokee?

Reply via email to