> On 1 Nov 2015, at 11:58 am, Dev Mukherjee <[email protected]> wrote:
>
> Hi all,
>
> The following is more a best practices question.
>
> We've been developing WSGI apps for a while, and also maintain a REST server
> micro-framework. Our applications like everything else in the Python world is
> made of micro-frameworks. We would typically use something like webapp2 to
> serve out "pages" and then build APIs using prestans.
>
> Both frameworks provide routers, and we end up having routes in Apache like
>
> Alias /assets/ /srv/app/static/assets/
> Alias /js/ /srv/app/static/js/
>
> WSGIScriptAliasMatch ^/api/(.*) /srv/app/wsgi/api.wsgi
> WSGIScriptAliasMatch ^/(.*) /srv/app/wsgi/app.wsgi
>
> where the two WSGI endpoints point to routers provided by the two frameworks.
I actually high discourage the use of WSGIScriptAliasMatch as it can do
unexpected things as far as it effect on the relationship between SCRIPT_NAME
and PATH_INFO. There is generally very little need for it.
The configuration about could be done as:
WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi
BTW, I presume you meant them to refer to different WSGI script files.
Also, for '/api/‘, that would result in SCRIPT_NAME being ‘/api’, with
remainder of URL being in PATH_INFO. In other words, the WSGI application will
not see itself as notionally being mounted at the root of the site.
If you wanted both WSGI application to think they were mounted at the root of
the site, you would use:
WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi/api/
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi
Note that not sure if need to trailing slash on end of last argument of first
line. Shouldn’t harm anything if not needed I don’t think, but do check.
I don’t remember how WSGIScriptAliasMatch does the breaking up between
SCRIPT_NAME and PATH_INFO and where it can causes surprises. Thus why suggest
it be avoided unless no choice.
> Most examples (on mod_wsgi docs) and from seeing configuration of
> mod_wsgi-express and frameworks like werkzeug seem to suggest that a WSGI app
> should have a single WSGI endpoint, and then perhaps use a middleware to
> wrap/dispatch the routes?
If you are talking about taking two distinct WSGI applications, implementing
micro services, which you so happen to just want to appear under different URL
(sub URLs) of the one site, I completely disagree with the idea that you must
composite them together by using a WSGI middleware that grafts them into the
same process.
The use of WSGI middleware to graft together what are really distinct service
end points is really a result of the limitations of any WSGI server used. In
Apache/mod_wsgi you don’t need to do that as you can handle it at the Apache
level, plus use rely on mod_wsgi to then separate the separate WSGI
applications into separate Python interpreter namespaces in the same process,
or better still, have them run in separate daemon process groups.
WSGIDaemonProcess api process=3 threads=3
WSGIDaemonProcess main processes=2 threads=2
WSGIScriptAlias /api/ /srv/app/wsgi/app2.wsgi process-group=api
application-group=%{GLOBAL}
WSGIScriptAlias / /srv/app/wsgi/app1.wsgi process-group=main
application-group=%{GLOBAL}
The reason using separate daemon process groups for each is so much better than
using a WSGI middleware within one interpreter context is that you can then
separately control the number of processes/threads used for each.
This flexibility is very important because those different WSGI applications,
web UI and REST API may have entirely different profiles for the amount of
traffic, whether they are CPU or I/O bound, length of response times etc. To
think they are the same and bundle them in the one process means you aren’t
likely going to be as readily tune the WSGI server to the best that you might.
That said, even within one WSGI application or another, there can be widely
different requirements as well. In that case, even further dividing up the URL
name space so that you separate work done across daemon process groups based on
things like CPU usage, response times etc can help. This is something that is
impossible to do with a WSGI server such as gunicorn by itself.
I have talked about this idea of breaking up applications vertically and
sending URLs into different process groups so they can be tuned for their
respective workload. You can find what I had to say in:
http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html
<http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html>
> Is there a correct way of addressing this? Any thoughts / experiences?
>
> If middlwares are the solution, any suggestions on where / which frameworks
> to look at?
I don’t think middlewares are necessary the solution. I strongly believe
partitioning should be managed at a higher level.
This doesn’t mean you can’t use such middleware and the Paste libraries has a
configuration file driven approach for grafting together WSGI applications at
different sub URL contexts so they can all run in one process.
I would only use such WSGI middlewares as a fall back though when you need to
run it all in one process as part of development, or if you had no other choice
because you were deployed to a host service that didn’t give you the
flexibility to use a decent WSGI server, or provide some means at their routing
layer to direct traffic for different sub URLs to different backends.
Even with such WSGI middleware in place you can still use the above with
Apache/mod_wsgi to map the different URLs into different processes, although
now that you are using WSGI middleware for grafting, it means you end up
importing potentially dead code into processes as they will have the part of
the URL namespace you aren’t handling loaded as well, unless the WSGI
middleware were smart enough to do lazy loading, which usually they aren’t.
The next level above doing separation with Apache/mod_wsgi alone is if you are
using Docker to bundle up separate WSGI applications. In this case you use
Apache as purely a front end to proxy then to different Docker containers with
each WSGI application running in them, where in the Docker containers you can
use mod_wsgi-express.
I talk about that topic in:
http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html
<http://blog.dscpl.com.au/2015/06/proxying-to-python-web-application.html>
http://blog.dscpl.com.au/2015/07/redirection-problems-when-proxying-to.html
http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html
<http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html>
As far as now handling this at the level of a PaaS, the typical PaaS doesn’t
provide such support.
Amusingly older types of hosting services such as WebFaction can, but Heroku
and OpenShift 2 cannot.
Next generation PaaS offerings coming out such as OpenShift 3 (based around
Docker and Kubernetes) will allow you to use it handle vertically separating
WSGI applications under sub URLs of the same host name.
On OpenShift 3 for example you can deploy your two separate WSGI applications
and when you expose the service using a route, when specifying a hostname, you
can also specify a path. The routing layer of OpenShift will for HTTP requests
then handle passing through requests under the different URL namespaces for
you. This means you don’t need to set up Apache to be doing such proxying.
OpenShift 3 has some really interesting capabilities around handling of many
micro services. This is not just related to routing and exposing them under the
one site at different URLs, but also the fact that each micro service can run
independently, with different CPU and memory resources allocated to them. This
way you can adjust the resources allocated to the actual amount used for your
tuned WSGI server and application.
You don’t therefore have situation you do with current generation PaaS where
you get this fixed bucket of resources and you never use it all. You either try
and screw around all the time with your WSGI server processes/threads to try
and fill the space, or you give up, waste resources when adding more instances.
With OpenShift, you tune your WSGI server and application as best can, then set
CPU and memory based on what that uses. When you need to scale, you simply
create more replicas. You don’t have wasted CPU and memory as your allocation
is a more accurate depiction of what is used. Thus when you scale you can fit
more instances in from your global allocation of CPU and memory.
So the important difference here is that next generation PaaS has your CPU and
memory allocation per project. Not per instance. That way you can divide up the
allocation how you see fit. This need not even be restricted to a single WSGI
application as within the one project you can run more than one service, api,
main, database etc, and they take from the project level bucket of CPU and
memory. You this have maximum flexibility.
Of course monitoring becomes even more important in this than it has in the
past. If you don’t have good monitoring, you are going to lack the ability to
properly tune your application and WSGI server, understand what resources they
do use, and so make the most of the new flexibility to break up resources.
Anyway, hopefully you understand this ramble.
Graham
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.