FWIW, if would do much better to Cherokee's credibility, presuming you speak for its development team, if you present proper informed analysis rather than conjecture and FUD.
Long rant below. :-) On Aug 26, 2:25 am, Alvaro Lopez Ortega <[EMAIL PROTECTED]> wrote: > Michael Schurter wrote: > > Please forgive my naive question, but I've been following Cherokee for > > a while without using it yet on any production servers. > > > Any chance of Cherokee speaking WSGI natively in the future like > >mod_wsgifor Apache? > > > I've just been really happy withmod_wsgifor Apache, but I'd love to > > switch to a lighter weight HTTP server like Cherokee. > > This is a very good question, indeed. > > My understanding is that we should not implement anything likemod_wsgi > for a number of reasons. > > Firstly, from the architectural point of view it is simply madness: how > would somebody want a web server to contain a huge interpreter that is > linked against dozens of libraries? Python itself is not some 'huge interpreter'. The runtime overhead of an interpreter instance is only a few hundred kilobytes. The misconception that it is much greater than this is because for years Python distributions distributed by Linux systems only provided a static library and not a shared library for the Python code. By default when building Python from source code it also wouldn't install a shared library. The result of this is that the Python library had to be embedded into the Apache module and when this got loaded into memory, because the object code wasn't relocatable, address relocations had to be done at load time meaning the executable object code became local process memory and so consumed a few extra MB per process. When Python is installed properly with a shared library, as Linux distributions now do, this is not an issue and one can see that true memory use of the Python interpreter itself is quite minimal, with object code in the Python library and also Python C extension modules being shared and not per process. So, to blame the interpreter here is quite wrong. This isn't to say though that the Python web application itself doesn't result in large amounts of memory usage. Obviously you can load up a very slim web application and the amount of memory it uses will be small, but one can also load up one of the fat frameworks such as Django, Pylons, Turbogears or Zope and it will take a lot more memory. The real culprit here therefore is not the Python interpreter, but the specific web application you load up. Even then the argument is really quite stupid as the web application itself will still take the same amount of memory whether or not it is hosted in the web server process or in a back end process which the web server communicates to using either HTTP proxying, SCGI or FASTCGI. The only time where the amount of memory used by the web application itself would be different is where the number of processes across which the web application is run is different within a configuration where embedded in the web server, versus as a backend application. What am talking about here is where running Apache on UNIX systems with prefork or worker MPM where it runs as a multiprocess web server. In that case there would be a copy of the web application in each Apache child process and so amount of memory is multiplied by number of child processes. This though comes down to how you configure Apache. Unfortunately most people are totally ignorant as to how Apache works and don't even understand that it uses multiple processes and what the implications of that are. Thus they never review the standard Apache configuration and make a more appropriate choice of Apache MPM and its configuration when running Python web applications. End result is that they see all this memory consumed by Apache and blame Python when it is instead their inability to configure Apache appropriately. One could have the same memory issues come up even when hosting the web application in a back end process if you created multiple instances and load balanced across them. When doing the back end process approach though people have to do stuff to get it to work and so don't switch their brains entirely off like they do with Apache when it just happens for them automatically. Overall, when configuring Apache it becomes a trade off between memory usage and performance/scalability. If you have a serious web site, then you would be putting lots of physical memory in the box and is that case a multi process configuration is acceptable given it gives you better performance and scalability than the alternatives. Thus it is an architecture decision. Some argue though that putting the Python web application in the web server causes a drop in static file serving performance. This is true, but for any serious large scale site you wouldn't use the same server instance to host your static media files. Instead you would use a separate instance and would even perhaps deploy nginx or lighttpd for that task as they provide better performance for that. For the dynamic Python web application, Apache though still provides a better more scalable option. > Second, it sounds hard to believe that mod_wsgi is faster than a plain > an simple SCGI application writing to a Unix socket. (Remember that WSGI > application can also use FastCGI and SCGI backends). This comment shows you don't even understand how mod_wsgi works in conjunction with Apache. When mod_wsgi is used to host a Python web application you can select one of two modes to run the application. The first mode is embedded mode whereby the Python web application is hosted within the Apache child processes themselves. When embedded mode is used then the issue of multiple copies of the Python web application as explained above and the extra memory usage described above needs to be taken into consideration. When this mode is used though, you do not have any proxying occurring like you do with FASTCGI and SCGI. This is because the Python request handler code is run inside of the same process where the underlying Apache code accepting and interpreted the request. In other words, there is no socket involved nor any wire protocol related to marshalling of data across the socket. Instead, the internal Apache request structure is morphed directly into a Python data structure and passed directly to the Python web application in the same process via the WSGI programmatic API. So, the short answer is that because no separate proxy hop is required, nor additional marshalling and reinterpretation of the request in the back end process, it is obviously going to perform better than an SCGI or FASTCGI solution. It should be emphasised at this point that WSGI is a programming API and not a wire procotol like SCGI and FASTCGI. Many people don't seem to understand this. Getting on to the second mode in which a Python web application can be run with mod_wsgi, this is daemon mode. In daemon mode the Python web application is not run in the Apache child processes themselves but in a distinct backend daemon process, or group of processes. In other words, the process model is exactly like SCGI or FASTCGI with the exception that Apache/mod_wsgi handles all the process management for you and also internally manages the marshalling and unpacking of the data across the UNIX socket used to communicate with the back end processes without the need for some adapter within the Python web application such as 'flup'. Thus, with daemon mode all your arguments about memory usage are moot anyway as the Python web application doesn't run in the Apache child processes and therefore not in the process which are serving up static content. When daemon mode is compared against SCGI and FASTCGI for Apache it is still somewhat faster even though same process and proxying model used. This is mainly because Apache/mod_wsgi implements the back end process as well with C code performing the unpacking of the proxying request data and again passing it directly to embedded Python web application using the WSGI API. In the case of SCGI and FASTCGI you still need that adapter such as 'flup' and that slows things down. The thing is, even though it is faster it doesn't matter in the grand scheme of things. This is because the network performance is nearly never the bottleneck in Python web applications. Instead it is the performance of the Python web applications own code and any database access. Thus, any difference in network performance is only going to amount to milliseconds per request within the context of a request taking 100s of milliseconds and possibly more. > What people want is a fast web infrastructure, I do not believe there is > someone who actually fancies Apache's bulky and monolithic style. Think > it in this way: if you could keep the application logic running > independently without hitting the performance (which is far more clean > and secure), wouldn't you? As have already indicated, perceived speed isn't everything. For large sites you would host static media off on a separate server dedicated to that purpose. For a large dynamic Python web application, what you then use ultimately doesn't matter if speed is your concern as different solutions are only going to differ by a little bit at best. What is more important is ease of setup and configuration and right now Apache/mod_wsgi arguably makes that a lot easier for Python web applications than using SCGI or FASTCGI solutions as Apache/mod_wsgi does everything for you and you don't need to setup separate process supervisors and startup scripts for your back end process instances, nor use third party packages such as 'flup' to map those wire protocols in to your Python web application using WSGI API. Daemon mode of mod_wsgi also allows you to run your Python web application as a different user to Apache, thus giving a better level of security. Agreed that how this is achieved is not locked down as far as suEXEC, however for a site which you control it isn't a problem and achieves the primary aim of avoiding needing to make stuff writable to Apache user. > Give Cherokee (SCGI|FastCGI) a try with your WSGI application.. and, let > there be light! :-) My believe is that besides fixing both the > architectural and security flaws you will improve performance. Yes, by all means still give Cherokee a go and you may well find it works for you and you like it. If so, then use it, but please don't let the FUD cloud things. Now getting back to the OP question about implementing something similar to mod_wsgi in Cherokee. Whether this even makes sense depends on a few things. First off is that if Cherokee is implemented as an event driven system like nginx and lighttpd then there isn't really any point. This is because WSGI effectively requires threading to be able to manage concurrency in a single process system. In an event driven system where there is only a single thread of control, the WSGI application blocking would block the whole web server. In nginx/mod_wsgi they try and counter that by relying on multiple processes much like with Apache prefork model. So, if Cherokee is not multi process and doesn't rely on threads then forget it. Second, if Cherokee has no way of managing distinct daemon process for SCGI and FASTCGI based applications, then also probably no point. This is because what Apache/mod_wsgi is giving you in daemon mode that makes it so attractive to people is that it manages all the processes for you, ensuring they are started/destroyed as necessary. If Cherokee can't do that, then you may as well just use SCGI and FASTCGI support that already exists. Personally I would probably agree that adding WSGI support direct to Cherokee would just be wasted effort. This is because the amount of actual effort required to add the support wouldn't be justified given what would probably end up being such a small set of end users for that feature. Graham _______________________________________________ Cherokee mailing list [email protected] http://lists.octality.com/listinfo/cherokee
