Re: [modwsgi] Good habits / best practices for routing wsgi endpoints

Graham Dumpleton Sun, 01 Nov 2015 02:06:08 -0800

> On 1 Nov 2015, at 11:58 am, Dev Mukherjee <[email protected]> wrote:
> 
> Hi all,
> 
> The following is more a best practices question. 
> 
> We've been developing WSGI apps for a while, and also maintain a REST server 
> micro-framework. Our applications like everything else in the Python world is 
> made of micro-frameworks. We would typically use something like webapp2 to 
> serve out "pages" and then build APIs using prestans. 
> 
> Both frameworks provide routers, and we end up having routes in Apache like
> 
>     Alias       /assets/        /srv/app/static/assets/
>     Alias       /js/            /srv/app/static/js/
> 
>     WSGIScriptAliasMatch    ^/api/(.*)  /srv/app/wsgi/api.wsgi
>     WSGIScriptAliasMatch    ^/(.*)      /srv/app/wsgi/app.wsgi
> 
> where the two WSGI endpoints point to routers provided by the two frameworks.


I actually high discourage the use of WSGIScriptAliasMatch as it can do 
unexpected things as far as it effect on the relationship between SCRIPT_NAME 
and PATH_INFO. There is generally very little need for it.

The configuration about could be done as:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

BTW, I presume you meant them to refer to different WSGI script files.

Also, for '/api/‘, that would result in SCRIPT_NAME being ‘/api’, with 
remainder of URL being in PATH_INFO. In other words, the WSGI application will 
not see itself as notionally being mounted at the root of the site.

If you wanted both WSGI application to think they were mounted at the root of 
the site, you would use:

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi/api/
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi

Note that not sure if need to trailing slash on end of last argument of first 
line. Shouldn’t harm anything if not needed I don’t think, but do check.

I don’t remember how WSGIScriptAliasMatch does the breaking up between 
SCRIPT_NAME and PATH_INFO and where it can causes surprises. Thus why suggest 
it be avoided unless no choice.

> Most examples (on mod_wsgi docs) and from seeing configuration of 
> mod_wsgi-express and frameworks like werkzeug seem to suggest that a WSGI app 
> should have a single WSGI endpoint, and then perhaps use a middleware to 
> wrap/dispatch the routes?

If you are talking about taking two distinct WSGI applications, implementing 
micro services, which you so happen to just want to appear under different URL 
(sub URLs) of the one site, I completely disagree with the idea that you must 
composite them together by using a WSGI middleware that grafts them into the 
same process.

The use of WSGI middleware to graft together what are really distinct service 
end points is really a result of the limitations of any WSGI server used. In 
Apache/mod_wsgi you don’t need to do that as you can handle it at the Apache 
level, plus use rely on mod_wsgi to then separate the separate WSGI 
applications into separate Python interpreter namespaces in the same process, 
or better still, have them run in separate daemon process groups.

WSGIDaemonProcess api process=3 threads=3
WSGIDaemonProcess main processes=2 threads=2

WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi process-group=api 
application-group=%{GLOBAL}
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi process-group=main 
application-group=%{GLOBAL}

The reason using separate daemon process groups for each is so much better than 
using a WSGI middleware within one interpreter context is that you can then 
separately control the number of processes/threads used for each.

This flexibility is very important because those different WSGI applications, 
web UI and REST API may have entirely different profiles for the amount of 
traffic, whether they are CPU or I/O bound, length of response times etc. To 
think they are the same and bundle them in the one process means you aren’t 
likely going to be as readily tune the WSGI server to the best that you might.

That said, even within one WSGI application or another, there can be widely 
different requirements as well. In that case, even further dividing up the URL 
name space so that you separate work done across daemon process groups based on 
things like CPU usage, response times etc can help. This is something that is 
impossible to do with a WSGI server such as gunicorn by itself.

I have talked about this idea of breaking up applications vertically and 
sending URLs into different process groups so they can be tuned for their 
respective workload. You can find what I had to say in:

http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html 
<http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html>
> Is there a correct way of addressing this? Any thoughts / experiences?
> 
> If middlwares are the solution, any suggestions on where / which frameworks 
> to look at?

I don’t think middlewares are necessary the solution. I strongly believe 
partitioning should be managed at a higher level.

This doesn’t mean you can’t use such middleware and the Paste libraries has a 
configuration file driven approach for grafting together WSGI applications at 
different sub URL contexts so they can all run in one process.

I would only use such WSGI middlewares as a fall back though when you need to 
run it all in one process as part of development, or if you had no other choice 
because you were deployed to a host service that didn’t give you the 
flexibility to use a decent WSGI server, or provide some means at their routing 
layer to direct traffic for different sub URLs to different backends.

Even with such WSGI middleware in place you can still use the above with 
Apache/mod_wsgi to map the different URLs into different processes, although 
now that you are using WSGI middleware for grafting, it means you end up 
importing potentially dead code into processes as they will have the part of 
the URL namespace you aren’t handling loaded as well, unless the WSGI 
middleware were smart enough to do lazy loading, which usually they aren’t.

The next level above doing separation with Apache/mod_wsgi alone is if you are 
using Docker to bundle up separate WSGI applications. In this case you use 
Apache as purely a front end to proxy then to different Docker containers with 
each WSGI application running in them, where in the Docker containers you can 
use mod_wsgi-express.

I talk about that topic in:

http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html 
<http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html>
http://blog.dscpl.com.au/2015/07/redirection-problems-when-proxying-to.html
http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html 
<http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html>

As far as now handling this at the level of a PaaS, the typical PaaS doesn’t 
provide such support.

Amusingly older types of hosting services such as WebFaction can, but Heroku 
and OpenShift 2 cannot.

Next generation PaaS offerings coming out such as OpenShift 3 (based around 
Docker and Kubernetes) will allow you to use it handle vertically separating 
WSGI applications under sub URLs of the same host name.

On OpenShift 3 for example you can deploy your two separate WSGI applications 
and when you expose the service using a route, when specifying a hostname, you 
can also specify a path. The routing layer of OpenShift will for HTTP requests 
then handle passing through requests under the different URL namespaces for 
you. This means you don’t need to set up Apache to be doing such proxying.

OpenShift 3 has some really interesting capabilities around handling of many 
micro services. This is not just related to routing and exposing them under the 
one site at different URLs, but also the fact that each micro service can run 
independently, with different CPU and memory resources allocated to them. This 
way you can adjust the resources allocated to the actual amount used for your 
tuned WSGI server and application.

You don’t therefore have situation you do with current generation PaaS where 
you get this fixed bucket of resources and you never use it all. You either try 
and screw around all the time with your WSGI server processes/threads to try 
and fill the space, or you give up, waste resources when adding more instances.

With OpenShift, you tune your WSGI server and application as best can, then set 
CPU and memory based on what that uses. When you need to scale, you simply 
create more replicas. You don’t have wasted CPU and memory as your allocation 
is a more accurate depiction of what is used. Thus when you scale you can fit 
more instances in from your global allocation of CPU and memory.

So the important difference here is that next generation PaaS has your CPU and 
memory allocation per project. Not per instance. That way you can divide up the 
allocation how you see fit. This need not even be restricted to a single WSGI 
application as within the one project you can run more than one service, api, 
main, database etc, and they take from the project level bucket of CPU and 
memory. You this have maximum flexibility.

Of course monitoring becomes even more important in this than it has in the 
past. If you don’t have good monitoring, you are going to lack the ability to 
properly tune your application and WSGI server, understand what resources they 
do use, and so make the most of the new flexibility to break up resources.

Anyway, hopefully you understand this ramble.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] Good habits / best practices for routing wsgi endpoints

Reply via email to