On 24 November 2014 at 04:59, Jeff Trawick <traw...@gmail.com> wrote:
> > If you're doing Python web apps it would be cool to "pip install httpd > FRAMEWORK-httpd-wiring" and have a command that wires it up based on > framework settings and a bit of other declarative configuration. (similar > for other ecosystems with a packaging/build infrastructure) mod_wsgi > actually has a version in PyPI that works like this, although it doesn't > bring httpd with it. > Downloading and compiling the whole of httpd as a side effect of doing a Python pip install isn't really practical. The process would just take too long for a start, plus it doesn't solve the problem that many systems will not have the dependencies installed in order to compile it. You don't want to have to also be separately downloading and compiling APR, APU-UTIL and PCRE now that they aren't bundled with the Apache source code. I have tried going that path, albeit not triggered by pip, in trying to create a build pack for Heroku which could be used to bring mod_wsgi to that PaaS and it was a right pain, especially since the resulting size of all the compiled components would chew up a significant part of the image slug allowance that Heroku gave you. In the end I gave up on it because it was so customised and unsupported by Heroku that no one would be likely to use it. So for the pip installable mod_wsgi it does at least rely on you having httpd and the httpd-dev packages, plus any dependencies for those installed. This still doesn't help with PaaS services which have such a narrow view of what they want to allow you to do. For example, Heroku will not provide the httpd and httpd-dev packages in the operating system image they use to allow people to run it using their own custom configurations and compile and use their own Apache modules. It even took me a couple of years at least to get Heroku to update their Python installations so they provided shared libraries and so allow any sort of dynamically loaded embedded system such as the mod_wsgi module inside of Apache to be able to use their Python installations. Before that I would have to also compile Python source code from scratch as well. Heroku isn't the only PaaS who has gone down a path which makes it near on impossible to use them with Apache and a customised setup. OpenShift does actually provide an Apache/Python/mod_wsgi cartridge, but they hardwire the Apache configuration and you cannot change it. The particular configuration actually has various problems in the way it is done and so provides a sub optimal experience. They also use a very old mod_wsgi version which RHEL version they use ships. Even if you could get around that you can't change the Apache configuration and not even the startup command, it isn't even possible to build an Apache module from scratch as they don't install the httpd-dev package for RHEL. The only PaaS where I could do what I want and use the pip installable mod_wsgi was dotcloud. This as because it was what became docker and so allowed a user to install the missing httpd-dev package in your own space and so it was possible to then actually compile custom Apache modules. So for me and turning around the rapid decline in mod_wsgi usage caused by the narrow options most PaaS providers give you, docker is definitely the way forward. The idea of a pip installable mod_wsgi is therefore two fold. The first is to work around the fact that Linux distributions ship very out of date versions of packages. Most Linux distributions are over a dozen releases behind on mod_wsgi. The second is that the pip installable mod_wsgi does more than just compile the mod_wsgi Apache module. It also installs a script called mod_wsgi-express that automatically generates an Apache configuration for you which is setup properly for mod_wsgi. This is what Jeff is alluding to in saying 'a command that wires it up based on framework settings and a bit of other declarative configuration'. This solves another serious problem that mod_wsgi has had over the years. That is that the default Apache configuration isn't particularly appropriate. This is especially the case for prefork MPM where Python code is run in embedded mode inside of the Apache child work process rather than in mod_wsgi daemon mode, whereby the Python code runs in separate processes. This isn't aided by what I would argue as being a somewhat flawed child worker dynamic scaling algorithm in Apache which causes too much process churn, negatively affecting embedded systems which have a large startup cost. So what mod_wsgi-express does is provide a turn key solution for setting up Apache with Python as a form of appliance which is going to suit the majority of cases where users are just running a single Python web application. I can take all the knowledge I have accumulated over the years as to what is the best way of setting up Apache for Python web applications to avoid problems and distil that in a custom streamlined Apache configuration, that even though it can still require some minor tuning to match your specific Python web application, does all the core setup that most people wouldn't even do for Python. To that end I am trying to combat the perception that Apache is slow and bloated for Python web applications, when in fact it is usually because they are using an old Apache and never set it up properly, by providing a best of class configuration for that use case. Where does docker fit into this? For mod_wsgi at least, docker means I can provide my own base docker image which has Apache 2.4 and latest mod_wsgi version installed. A user would then simply create their own docker image deriving from that, which adds in their Python web application code. For the simplest case, using some of the ONBUILD features of docker, the Dockerfile for their project could be one line. Add a second line if they want to override the number of processes/threads used for the Python web application to deal with whatever throughput requirement they have. The mod_wsgi-express script will deal with everything else for them when generating the configuration. My view of how perhaps docker should be harnessed, is therefore not to try and provide one docker image for Apache that just gives you a generic Apache configuration and then you just leave it all up to the user as has been done in the past. Instead create specialised docker images for using Apache in certain roles. Provide a much more minimal interface for customising the configuration where the build of the Apache configuration is generated by a script which has been written by someone who actually understands how to setup Apache for that use case and so streamline it and make it run at its best for that narrow use case that that docker image using Apache was intended for. I am not actually far off being able to offer this docker appliance image for mod_wsgi and Python web applications. Mostly a case of just finding the time to work out all the requirements to get it up on the docker hub. Am also trying to sort out issues with the official docker Python image, which like Heroku has made the mistake of not shipping shared libraries. So right now I can't base off their image. The other option was to base it odd the official docker image for httpd. For that image though, they have made it useless to people who want to use it as the base for when building other Apache modules as they strip out at the end all the dependencies that were originally required to actually build Apache and any additional modules. These latter issues with the official docker images for both httpd and Python shows another problem, quite similar to some of the things I have seen with PaaS providers. That is that the people implementing those systems aren't strong Apache users themselves. They therefore create systems which they think are going to be suitable for a wide range of use cases, where as in fact they are only suitable for very narrow use cases and perform poorly in other cases. We need to do a better job of highlighting that such solutions aren't actually workable and explain why. If we don't say anything, then they aren't changed and people continue to have a bad experience of using Apache when it isn't Apache's fault but how the provider set it up. Anyway, enough rambling. Graham