2009/11/5 Graham Leggett <minf...@sharp.fm>: > Jim Jagielski wrote: > >> Let's get 2.4 out. And then let's rip it to shreds and drop >> buckets/brigades and fold in serf. > > I think we should decide on exactly what problem we're trying to solve, > before we start thinking about how it is to be solved. > > I'm keen to teach httpd v3.0 to work asynchronously throughout - still > maintaining the prefork behaviour as a sensible default[1], but being > asynchronous and non blocking throughout. > > [1] The fact that dodgy module code can leak, crash and be otherwise > unsociable, and yet the server remains functional, is one of the key > reasons why httpd still endures.
Sorry, long post but it was inevitable that I was going to air all this at some point. Now seems a good as time as any. I'd like to see a more radical architecture change, one that recognises that it isn't just about serving static files any more and provides much better builtin support for safe hosting of content generating web applications constructed using alternate languages. Before anyone jumps to the conclusion that I want to start seeing even more heavy weight applications being run direct in the Apache server child processes that accept initial requests, know that I don't want that and that I actually want to promote a model which is the opposite and which would encourage people not to do that. As first step, like Jim I would like to see the current Apache server child processes (workers) being asynchronous. In addition to that though, I would like to see as part of core Apache, and running in parent process, a means for spawning and monitoring of distinct processes outside of the set of worker processes. There is currently support in APR and in part in Apache for 'other' processes via 'apr_proc_other_child_???()' functions, but this is quite basic and you still need to a large degree need to roll your own management routines around that for (re)spawning etc. As a result, you see modules such as mod_cgid, mod_fastcgi, mod_fcgid, mod_wsgi all having their own process management code for managing either their daemon processes and/or manager process. Technically one could implement this as a distinct module called mod_procd which had an API which could be utilised by other modules and stop duplication of all this stuff, but perhaps needs to go a step further than that as far as being integrated into core. This is because at present any 'other' processes are dealt with rather harshly on graceful restarts because they are still simply killed off after a few seconds if they don't shutdown. Being able to extend graceful restart semantics into other processes may be worthwhile for some applications. The next thing want to see is for the whole FASTCGI type ecosystem be revisited and for a better version of this concept for hosting web applications in disparate languages be developed which modernises it and brings it in as a core feature of Apache. The intent here being to simplify the task for implementers as well as those wish to deploy applications. An important part of this would be to switch away from the interface being a socket protocol. Instead, let the web server control both halves of the communication channel between Apache worker process and the application daemon process. What would replace the socket protocol as interface would be C API and instead of the application having to implement the socket protocol as foreign process, specific language support would provided as a way of a dynamically loaded plugin. That plugin would then use embedding to access support for a particular language and just execute code in the file that the enclosing code of the web server system told it to execute. By way of example, imagine languages such as Python, Perl or Ruby which in turn now have simplified web server interfaces in the form of WSGI, PSGI and RACK, or even PHP. In the Apache configuration one would simply say that a specific file extension is implemented by a specific named language plugin. One would also indicate that a separate manager process should be started up for managing processes for handling any requests for that language. Only after that separate manager process had been spawned be it by just straight fork or preferably fork/exec would the specific language plugin be loaded. This eliminates the problems caused by complex language modules being preloaded into Apache parent process and causing conflicts with other languages. The existing mod_php module is a good example for causing lots of problems because of it dragging in libraries which aren't multithread safe. That manager process would then spawn its own language specific worker processes as configured for handling actual requests. When the main asynchronous Apache worker processes receive a request and determines that target resource file is related to specific language, it determines then how to connect to those language specific worker processes and proxies the request to them for handling. On the language worker process side the web server part of the code in that process receives the proxied request and then calls into the plugin code to have the request handle against the target file. Because most language solutions for web applications aren't asynchronous, these language specific worker processes would still use traditional threading techniques, or could even be single threaded where language or extension modules for that language aren't thread safe such as is case for PHP. In the bigger scheme of things what we would have is a set of front end Apache worker processes which are asynchronous and which handle static file requests, but where request relates to resources which needed to be implemented by a specific complex language would be proxied internally to other processes managed internally within the sphere of the web server. The language specific worker processes could be single threaded, multithreaded, or could also still provide their own asynchronous API. The important thing is we aren't loading support for these languages into the Apache parent process of the main Apache worker processes. The separation isn't as great as for FASTCGI and is still quite tightly integrated with the web server code effectively controlling both ends of the communication channel used when proxying as well as all the process management. The actual socket protocol used where the proxying occurs is at this point not important as it is a private protocol within the web server and is not intended to be exposed publicly. In other words, the protocol is only used to communicate with the web servers own process on the same server. It is not intended to be used to communicate with process on other servers such as with FASTCGI. For the latter then traditional HTTP proxying techniques can be used. By the protocol being private then it can be changed and updated as needed based on the requirements of the web server and you aren't beholden to some external community and get stuck with a protocol that never gets updated and which over time just becomes a poor solution for the problem needing to be solved, such as in some respects has occurred with FASTCGI where there has never been updates to FASTCGI to make it more modern and usable. The protocol might be packet based like with FASTCGI or AJP, but also might be more like HTTP or something simplified like WAKA. Use of packet based protocols wouldn't be strictly necessary as probably want to avoid trying to multiplex multiple requests over a single proxied channel. Important thing though is to be able to handle end to end 100-continue as necessary, something which FASTCGI cant currently do. Use of distinct manager processes for each language also has other benefits. First is that you could have multiple instances which are configured differently. In other words, multiple process groups to which requests for that language could be delegated. These could be configured differently in respect of number of processes they create for handling requests or number of threads in those process, whether worker processes are precreated, created only on demand, how often they are recycled, killed off when idle or what language modules are preloaded into the manager process before the language specific worker processes are forked. Secondly, process groups could also drop privileges down to different users rather than Apache user, making it a much simpler process to run different applications or different users codes as different users, thereby avoiding the whole mess that is suexec. A third benefit is that the Apache configuration files themselves would really only have details of how to map URLs to those managed processes for each language. The configuration for each language or process group could be distinct. This would allow configuration for process groups to be changed and a process group restarted without having to restart the whole of Apache. As to the language specific worker processes, because the web server code would control the main loop in that that as well you option up better ability to control those processes. This includes better management of process recycling when set number of requests reached, when processes are idle or when memory usage bloats out. It also opens up option for instrumenting code with hooks for collecting statistics about how efficiently processes are handling requests and whether they are getting overloaded and whether the number of processes/threads in a process group needs to be tuned further to cope with load or even to cut back on processes/threads if under utilised. Finally we may even get a chance to improve how error logging is done for such hosted language applications. The current FASTCGI method of proxying error messages back via the request channel has its problems including fact that technically there is no error channel during process startup or between requests. If looking at error logging maybe can come up with better system for handling error logging across a large number of virtual hosts. The current limitations on needing to use VirtualHost or other complex systems for separate live error logs for lots of virtual hosts can be a pain and for VirtualHost doesn't scale for large number of virtual hosts and isn't particular dynamic because of large cost of restarting Apache to add new hosts. So, solve dynamic configuration of virtual hosts where can have separate application error logging and you are definitely on a winner. Anyway, hope some can see an inkling of what I am suggesting. I have left out many details based on my opinions and previous thinking about this and certainly would be much easier to describe this on a white board where can draw pictures. I guess overall I just want to see if we can come up with a more modern web server that is better for complex dynamic web application hosting as well as static file serving. I don't want to see us just go asynchronous, becoming just another static file serving web server like nginx, lighttpd or cherokee and ignore the problem of dynamic web applications and punt that on to a less than capable FASTCGI eco system. Thoughts? Graham