On Aug 26, 10:57 pm, Alvaro Lopez Ortega <[EMAIL PROTECTED]> wrote: > >> So, despite what you suggest, if a bare minimum interpreter is twice as > >> big as the web server, I wouldn't personally call it "small". IMO Python > >> rocks anyway, but calling it small may be too much. > > > But to put that if further context, it is not uncommon for a process > > holding a Python web framework instance to be anywhere between 40 and > > 100MB for a typical site. > > You have just raised my point. You do NOT want your web server process > to weight 100Mb if it can weight 1.3Mb! > > It might be my perception, but having such a big web server in memory > without a really REALLY good reason sounds mad. And the point is that > there is not such good reason but a few problems.
As I pointed out previously, for a large site like that they wouldn't serve static media in the same web server but in a separate web server instance such as nginx or lighttpd. Thus, the only reason the web server component exists in this case is to feed requests into the dynamic Python web application. So except for purist ideals it makes no difference if you embed Python in the web server as with Apache/mod_wsgi in embedded mode, or whether you embed the web server in the Python application, such as with using Paste HTTP server. In both cases for an equivalent configuration the amount of overall memory used isn't going to be much different. When using FASTCGI and SCGI things aren't much different really as in a WSGI application it is the Python code which is doing all the URL dispatch. In other words, any ability of the web server to dispatch based on the URL isn't used and it just funnels all requests below the application mount point through to the FASTCGI/SCGI process. In the process you have interjected the need to perform a transformation from HTTP to FASTCGI/SCGI to WSGI, where as in cases above one is more or less going direct from HTTP to WSGI. > Having Python plus the application logic executing under another PID is > much cleaner form the architectural point of view, and as a matter of > fact, more secure. For instance, if the interpreter crashes because a > bug in one of the dozens of libraries that it needs to load, I want my > web server to continue serving cached and static content until a new > application server is spawned and ready to use. Since Apache is a multi process web server if one child process crashes then there are other processes there still handling requests while the child process is replaced. The only configuration where concurrent requests may be affected is if using multithread worker MPM. In this case any concurrent requests in the same process that crashed would be affected. For large high performance sites, it would always be preferable that one uses lots of physical memory and use single threaded prefork MPM. In that case only that one request that caused the crash would be affected. This is a reliability issue and nothing to do with security. In mod_wsgi daemon mode one can still run processes as a different user ID in a different process. It does not use suEXEC and process is a fork of Apache parent so technically one can dig into Apache data structures inherited from parent process. In other words, not as secure as suEXEC, but if you control your own site, or trust the users, then acceptable. If you are wanting to set up a hostile environment like shared hosting for unknown users, then yes suEXEC and FASTCGI/SCGI would be preferable and not arguing it wouldn't be. > What would happen if the interpreter called a blocking third party > function? Depending on the web server architecture it would either > starve or trigger a -let's call it- emergency mechanism to take care of > the situation (which does not sound clean nor reliable at all). What happens in that case depends on the configuration and for how long it blocked. For Apache prefork and mod_wsgi in embedded mode, because it is single threaded nothing else is affected. For Apache worker MPM and mod_wsgi in embedded mode, then yes locking of the GIL and not releasing it would cause all concurrent Python requests in the same process to wait. In both cases though, because Apache is multiprocess, other process can still handle requests. Finally, if using mod_wsgi in daemon mode, if such buggy code existed and didn't release the GIL before going into a blocking state, or the code simply managed to deadlock itself, then a feature of mod_wsgi comes to the rescue. What this feature implements is a mechanism which detects a dead lock situation in use of the Python GIL or a request handler that has held lock for too long and the process will be forcibly killed off. Because Apache/mod_wsgi handles process management, that process will be automatically restarted. So, for daemon mode of mod_wsgi at least, it protects itself from such conditions and ensures that operation of the application can continue and things not hang. The thing is though that any code that doesn't release the GIL when going into a blocking call is flawed and not implemented correctly as C extension modules for Python are supposed to release the GIL. You therefore are trying to argue that because someone can write buggy software when they shouldn't be that the whole concept is bad. Also, this condition can still occur when using FASTCGI and SCGI. In that situation it is actually worse as there is no deadlock detection mechanism like mod_wsgi has and so the FASTCGI/SCGI process just hangs until a user manually kills it and restarts it. > Besides, running the same WSGI application as a SCGI app-server would, > in the worst scenario, be as fast as running it in the embedded > interpreter (in fact, I bet it is faster). If you compare FASTCGI/SCGI on Apache with both mod_wsgi embedded and mod_wsgi daemon for a simple hello world application, thus effectively testing just the web serving mechanism, the FASTCGI/SCGI implementations are slower. If you compare Apache/mod_wsgi to FASTCGI/SCGI running under a different web server you may well get different results, but because FASTCGI/SCGI require an extra hop and due to the overhead of 'flup' needed to then bridge to WSGI in the backend process you may find it isn't as good as you think. As pointed out before though, all well and good for a hello world application, but in real world applications the network isn't the bottleneck and performance differences at this level are swallowed up by those other overheads. > So, I still understand that there is a basic issue in the model. There > is no clear win for the couple of problems that are introduced (security > and architecture). True, they are simply different ways of doing things and as mentioned before it often comes down to what people find easier to configure and use. As much as we argue about embedded versus FASTCGI/SCGI like solutions, other people such as the Pylons community would outright refuse to use either. Their religion is that running Paste HTTP server inside of the Python application behind mod_proxy is the only true path. So, choice is good. :-) Graham _______________________________________________ Cherokee mailing list [email protected] http://lists.octality.com/listinfo/cherokee
