Re: [Cherokee] mod_wsgi for cherokee?

Graham Dumpleton Mon, 25 Aug 2008 23:35:37 -0700

FWIW, if would do much better to Cherokee's credibility, presuming you
speak for its development team, if you present proper informed
analysis rather than conjecture and FUD.

Long rant below. :-)

On Aug 26, 2:25 am, Alvaro Lopez Ortega <[EMAIL PROTECTED]> wrote:
> Michael Schurter wrote:
> > Please forgive my naive question, but I've been following Cherokee for
> > a while without using it yet on any production servers.
>
> > Any chance of Cherokee speaking WSGI natively in the future like
> >mod_wsgifor Apache?
>
> > I've just been really happy withmod_wsgifor Apache, but I'd love to
> > switch to a lighter weight HTTP server like Cherokee.
>
> This is a very good question, indeed.
>
> My understanding is that we should not implement anything likemod_wsgi
> for a number of reasons.
>
> Firstly, from the architectural point of view it is simply madness: how
> would somebody want a web server to contain a huge interpreter that is
> linked against dozens of libraries?

Python itself is not some 'huge interpreter'. The runtime overhead of
an interpreter instance is only a few hundred kilobytes. The
misconception that it is much greater than this is because for years
Python distributions distributed by Linux systems only provided a
static library and not a shared library for the Python code. By
default when building Python from source code it also wouldn't install
a shared library. The result of this is that the Python library had to
be embedded into the Apache module and when this got loaded into
memory, because the object code wasn't relocatable, address
relocations had to be done at load time meaning the executable object
code became local process memory and so consumed a few extra MB per
process. When Python is installed properly with a shared library, as
Linux distributions now do, this is not an issue and one can see that
true memory use of the Python interpreter itself is quite minimal,
with object code in the Python library and also Python C extension
modules being shared and not per process.

So, to blame the interpreter here is quite wrong. This isn't to say
though that the Python web application itself doesn't result in large
amounts of memory usage. Obviously you can load up a very slim web
application and the amount of memory it uses will be small, but one
can also load up one of the fat frameworks such as Django, Pylons,
Turbogears or Zope and it will take a lot more memory.

The real culprit here therefore is not the Python interpreter, but the
specific web application you load up. Even then the argument is really
quite stupid as the web application itself will still take the same
amount of memory whether or not it is hosted in the web server process
or in a back end process which the web server communicates to using
either HTTP proxying, SCGI or FASTCGI.

The only time where the amount of memory used by the web application
itself would be different is where the number of processes across
which the web application is run is different within a configuration
where embedded in the web server, versus as a backend application.
What am talking about here is where running Apache on UNIX systems
with prefork or worker MPM where it runs as a multiprocess web server.
In that case there would be a copy of the web application in each
Apache child process and so amount of memory is multiplied by number
of child processes.

This though comes down to how you configure Apache. Unfortunately most
people are totally ignorant as to how Apache works and don't even
understand that it uses multiple processes and what the implications
of that are. Thus they never review the standard Apache configuration
and make a more appropriate choice of Apache MPM and its configuration
when running Python web applications.

End result is that they see all this memory consumed by Apache and
blame Python when it is instead their inability to configure Apache
appropriately. One could have the same memory issues come up even when
hosting the web application in a back end process if you created
multiple instances and load balanced across them. When doing the back
end process approach though people have to do stuff to get it to work
and so don't switch their brains entirely off like they do with Apache
when it just happens for them automatically.

Overall, when configuring Apache it becomes a trade off between memory
usage and performance/scalability. If you have a serious web site,
then you would be putting lots of physical memory in the box and is
that case a multi process configuration is acceptable given it gives
you better performance and scalability than the alternatives. Thus it
is an architecture decision.

Some argue though that putting the Python web application in the web
server causes a drop in static file serving performance. This is true,
but for any serious large scale site you wouldn't use the same server
instance to host your static media files. Instead you would use a
separate instance and would even perhaps deploy nginx or lighttpd for
that task as they provide better performance for that. For the dynamic
Python web application, Apache though still provides a better more
scalable option.

> Second, it sounds hard to believe that mod_wsgi is faster than a plain
> an simple SCGI application writing to a Unix socket. (Remember that WSGI
> application can also use FastCGI and SCGI backends).

This comment shows you don't even understand how mod_wsgi works in
conjunction with Apache.

When mod_wsgi is used to host a Python web application you can select
one of two modes to run the application. The first mode is embedded
mode whereby the Python web application is hosted within the Apache
child processes themselves.

When embedded mode is used then the issue of multiple copies of the
Python web application as explained above and the extra memory usage
described above needs to be taken into consideration. When this mode
is used though, you do not have any proxying occurring like you do
with FASTCGI and SCGI. This is because the Python request handler code
is run inside of the same process where the underlying Apache code
accepting and interpreted the request.

In other words, there is no socket involved nor any wire protocol
related to marshalling of data across the socket. Instead, the
internal Apache request structure is morphed directly into a Python
data structure and passed directly to the Python web application in
the same process via the WSGI programmatic API.

So, the short answer is that because no separate proxy hop is
required, nor additional marshalling and reinterpretation of the
request in the back end process, it is obviously going to perform
better than an SCGI or FASTCGI solution. It should be emphasised at
this point that WSGI is a programming API and not a wire procotol like
SCGI and FASTCGI. Many people don't seem to understand this.

Getting on to the second mode in which a Python web application can be
run with mod_wsgi, this is daemon mode. In daemon mode the Python web
application is not run in the Apache child processes themselves but in
a distinct backend daemon process, or group of processes. In other
words, the process model is exactly like SCGI or FASTCGI with the
exception that Apache/mod_wsgi handles all the process management for
you and also internally manages the marshalling and unpacking of the
data across the UNIX socket used to communicate with the back end
processes without the need for some adapter within the Python web
application such as 'flup'.

Thus, with daemon mode all your arguments about memory usage are moot
anyway as the Python web application doesn't run in the Apache child
processes and therefore not in the process which are serving up static
content.

When daemon mode is compared against SCGI and FASTCGI for Apache it is
still somewhat faster even though same process and proxying model
used. This is mainly because Apache/mod_wsgi implements the back end
process as well with C code performing the unpacking of the proxying
request data and again passing it directly to embedded Python web
application using the WSGI API. In the case of SCGI and FASTCGI you
still need that adapter such as 'flup' and that slows things down.

The thing is, even though it is faster it doesn't matter in the grand
scheme of things. This is because the network performance is nearly
never the bottleneck in Python web applications. Instead it is the
performance of the Python web applications own code and any database
access. Thus, any difference in network performance is only going to
amount to milliseconds per request within the context of a request
taking 100s of milliseconds and possibly more.

> What people want is a fast web infrastructure, I do not believe there is
> someone who actually fancies Apache's bulky and monolithic style. Think
> it in this way: if you could keep the application logic running
> independently without hitting the performance (which is far more clean
> and secure), wouldn't you?

As have already indicated, perceived speed isn't everything. For large
sites you would host static media off on a separate server dedicated
to that purpose. For a large dynamic Python web application, what you
then use ultimately doesn't matter if speed is your concern as
different solutions are only going to differ by a little bit at best.

What is more important is ease of setup and configuration and right
now Apache/mod_wsgi arguably makes that a lot easier for Python web
applications than using SCGI or FASTCGI solutions as Apache/mod_wsgi
does everything for you and you don't need to setup separate process
supervisors and startup scripts for your back end process instances,
nor use third party packages such as 'flup' to map those wire
protocols in to your Python web application using WSGI API.

Daemon mode of mod_wsgi also allows you to run your Python web
application as a different user to Apache, thus giving a better level
of security. Agreed that how this is achieved is not locked down as
far as suEXEC, however for a site which you control it isn't a problem
and achieves the primary aim of avoiding needing to make stuff
writable to Apache user.

> Give Cherokee (SCGI|FastCGI) a try with your WSGI application.. and, let
> there be light! :-) My believe is that besides fixing both the
> architectural and security flaws you will improve performance.

Yes, by all means still give Cherokee a go and you may well find it
works for you and you like it. If so, then use it, but please don't
let the FUD cloud things.

Now getting back to the OP question about implementing something
similar to mod_wsgi in Cherokee. Whether this even makes sense depends
on a few things.

First off is that if Cherokee is implemented as an event driven system
like nginx and lighttpd then there isn't really any point. This is
because WSGI effectively requires threading to be able to manage
concurrency in a single process system. In an event driven system
where there is only a single thread of control, the WSGI application
blocking would block the whole web server. In nginx/mod_wsgi they try
and counter that by relying on multiple processes much like with
Apache prefork model. So, if Cherokee is not multi process and doesn't
rely on threads then forget it.

Second, if Cherokee has no way of managing distinct daemon process for
SCGI and FASTCGI based applications, then also probably no point. This
is because what Apache/mod_wsgi is giving you in daemon mode that
makes it so attractive to people is that it manages all the processes
for you, ensuring they are started/destroyed as necessary. If Cherokee
can't do that, then you may as well just use SCGI and FASTCGI support
that already exists.

Personally I would probably agree that adding WSGI support direct to
Cherokee would just be wasted effort. This is because the amount of
actual effort required to add the support wouldn't be justified given
what would probably end up being such a small set of end users for
that feature.

Graham
_______________________________________________
Cherokee mailing list
[email protected]
http://lists.octality.com/listinfo/cherokee

Re: [Cherokee] mod_wsgi for cherokee?

Reply via email to