Re: [modwsgi] Re: Converting from modpython's PythonAccessHandler to modwsgi

Graham Dumpleton Thu, 18 Mar 2010 03:41:30 -0700

Sorry for delay in replying, much to do any many things on my mind.

On 16 March 2010 04:15, Deron Meranda <[email protected]> wrote:
> On Mar 12, 10:56 pm, Graham Dumpleton <[email protected]>
> wrote:
>> On 13 March 2010 07:06, Deron Meranda <[email protected]> wrote:
>> > In particular I use the PythonAccessHandler hook to do all kinds
>> > of authentication/authorization stuff.
>>
>> Hmmm, one shouldn't be using the access handler for authentication and
>> authorization. That is what the authentication and authorization hooks
>> are meant to be for. :-)
>
> Yes.  However the authn/authz hooks only get invoked when one
> also uses the Apache Require directive.  And that semantically
> seems too mismatched for my environment, which has much more
> complex access control rules.  I don't use any of the standard
> HTTP authentication methods.


It is not that simple. I think the thing many missed was that you
aren't restricted to the standard require arguments. The whole point
is that you can define your own arguments for require which your
authorisation handler would interpret. I actually don't remember
anyone ever specifying their own requirement arguments and
interpreting them using a custom mod_python handler. As such, pretty
well all authorisation handlers I saw were kludges and many cheated by
rolling authorisation up with authentication into the authentication
handler. The Django authentication/authorisation examples using
mod_python were like that.

> I'm aware that this means I'm stepping outside of the Apache
> authn/authz model, but just slightly so.
>
> The Access handler on the other hand is always called, and it
> still happens early enough (before the content handler).  And
> for the few content handlers that need it, I can still synthesize
> (or simulate) normal access control by pushing values into
> req.user for example.
>
> For me, I just found it much more convenient to put everything
> security-related (authentication and access control) into the
> access handler; and it will either let requests pass on up to
> the content handlers, or return 403's (or occasionally 503's)
> And for the subset of content handlers that are written in
> Python, it can also pass additional information up to them;
> mostly in the form of Python objects.
>
> Oh, I should mention that I've also used stacked handlers
> in a few cases (listing multiple python handlers in the
> Python*Handler directives).  I would assume that I definitely
> can't do that, for the same apache phase anyway, once I
> start mixing mod_python and mod_wsgi.

Since mod_wsgi doesn't currently allow hooking anything but the final
handler phase for content, obviously not for most of them. In the case
of the content handler, whether both a mod_python handler and mod_wsgi
would be called depends on order precedence and whether mod_python is
loaded in right order relative to mod_wsgi such that mod_python
content handler gets to run first. The mod_wsgi handler is only going
to execute and return OK though if the handler type condition is
correct. Ie., the resource type is wsgi-script. If it isn't that, it
returns DECLINED. So order doesn't matter if mod_wsgi handler not
actually run.

Do you use stackable handlers in non content handler phases? The rules
in mod_python as to how handlers stack in a phase wasn't always
workable in that as soon as you define a handler in a context, it
overrode it form a parent context. Thus not possible to run both
unless you respecify the handler from parent context in child context
along with the other. Ie., no way of saying that a handler was
additive to handler defined in an outer context.

I can't remember whether I addressed this or whether I just added it
as an idea in the JIRA issue tracker for mod_python.

> [Technically I guess there is the Apache handler chain,
> and the the mod_python handler stack (each with slightly
> different semantics in terms of handling DEFER, etc.).

It shouldn't be different now in respect of behaviour when return OK
or DECLINED as I fixed the mistakes mod_python had in the way it did
that.

> So would it be possible for a mod_python handler (stack) and
> a mod_wsgi handler to be on the same Apache handler chain?

Yes as that is a fundamental feature of Apache modules.

> And if so in which order would they be executed?  Not
> that I think I need to do that, but I'm curious.]

Dependent on Apache module load order unless a precedence relationship
is specified in the C code of the modules to indicate order relative
to other named modules. In general the order shouldn't matter between
Apache modules which are hooked in middle for a phase like mod_python
is.

> I also in a few cases have used mod_python's multiple
> interpreter feature; where I wanted an extra level of
> separation to help prevent leakage of information between
> python environments.  That though is not strictly necessary,
> so I could easily enough transition to using a single
> python interpreter if needed.

You actually have better control over sub interpreter usage in
mod_wsgi than you do in mod_python as you have the ability to control
which is used dynamically through req.notes, mod_rewrite and a
dispatch function for content phase.

> Even if mod_python and mod_wsgi can somewhat coexist (with the
> caveat that mod_python gets started first so that modwsgi
> doesn't get to do some initialization) --- are there any
> issues if you use multiple python interpreters?

The only interpreter which both share is the main interpreter, all
others are distinct to each. Sub interpreter usage in mod_wsgi is
correct in mod_wsgi and still broken in some areas in mod_python. So
any interpreter issues are going to be the existing ones that
mod_python has in relation to threads and thread local storage.

> Though I
> take it that even the Python folks (Guido) aren't too keen
> on keeping the multiple interpreter support any more.

There are a few quite vocal people, I did not understand Guido to be
specifically against them.

>> In mod_wsgi, so long as using mod_wsgi 3.X, you can use thread local
>> storage to preserve data between phases, albeit that the WSGI
>> application must be running in embedded mode and in the same
>> interpreter. The first phase in a request would always need to make
>> sure it cleared out any data hanging over from a prior request.
>
> Makes sense.  Though I currently don't use any threading
> model, because as you've mentioned it has been arguably broken
> in mod_python (at least as of the time in history when I wrote
> most of my framework); and also threading causes some grief in
> some other dependencies of mine.  So I've been happy enough with
> the multi-process model.  And the few cases where I had to have
> thread support (e.g., Xapian), I've managed to use a process
> proxy approach to get those things outside of the Apache
> processes -- which isn't necessarily a bad thing, because then
> it lets me use SELinux to compartmentalize things even more.
>
> But yes, I had assumed that the embedded mode would be needed
> if python objects were to be shared directly across phases.
>
>
>> You can access the request object directly and put data in
>> req.subprocess_env and req.notes if you use apswigpy (SWIG bindings
>> for Apache). Stuffing values in req.subprocess_env means they will
>> show up in WSGI environ dictionary of subsequent phase and will also
>> make it across to WSGI application itself if run in daemon mode.
>>
>> Rather than use apswigpy, you could always write a custom Python
>> extension module to allow you to stuff values in same.
>
> Great, I already do use req.subprocess_env for a little bit now;
> though most of my complex objects just get stuffed into the req
> object directly as additional non-standard members.  I assume I
> don't need to touch the apswigpy to continue to use the
> subprocess_env, do I?

If wanting to set it from WSGI, yes you do, as WSGI interface doesn't
give any access to original Apache request object. It is a special
extension of mod_wsgi that allow a Python CObject reference to be
passed as part of environ dictionary. This CObject reference could
then be used to construct Python wrapper for request object using
apswigpy.

If you are talking about whether setting env variables in mod_python
will show up in WSGI, then no, that doesn't require apswigpy.

> If the subprocess_env does end up in the WSGI environment, then
> that seems like the safest and most future-proof approach to
> transitioning my content handlers to mod_wsgi.  Though I can't
> pass "live" objects so I'll have to serialize everything.  That
> should be straight forward enough, but perhaps not quite as efficient.
>
> This will though mean that my new content handlers (in wsgi) won't
> be using the exact same python objects as my access handler; so
> some extra things may not work.  For example my access handler
> does a bit of extra safety checking, such as making sure that
> there aren't any database transactions that are accidentally
> left in-progress and span more than one http request.  Also the
> access handler also chooses which database account(s) to use for
> each HTTP request (thus also allowing the database to enforce
> its own internal access control rules too).  Basically my
> access handler makes most of the security decisions, and the
> content handler just focuses on the content.
>
> That kind of logic though could be moved up into the content
> handler phase, with some additional framework. Though it
> also means that the remainder of my access control logic and
> my content logic will necessarily have to each operate within
> separate database transaction scopes since they'll no longer
> be able to share the same python interpreter.
>
>
>> By mixing mod_python and mod_wsgi you do loose some configuration
>> control over mod_wsgi because mod_python will hijack the Python
>> interpreter initialisation and so mod_wsgi can do any setup which has
>> to be done before that initialisation.
>
> Are the specifics of this interaction documented anywhere?

For some things yes, but not others.

> Also, this may be more of a question for the mod_python list,
> but since the development of mod_python is pretty much in a
> stable leave-it-alone mode; do you foresee anything coming
> where mod_wsgi and mod_python diverge so much that they will not
> be coexistent in the future (or that mod_python could be
> dropped entirely?)  Or what about Python 3.x, as I'm still
> running with Python 2.x?

It is extremely unlikely that mod_python will be changed and there is
some discussion in ASF at the moment as to whether it be shifted into
the ASF Attic. This will effectively mark it as dead unless a saviour
comes along and later resurrects it, at which point I believe it has
to go back through the ASF Incubator program before it can be
reestablished as a main project.

As it stands, mod_python will not I believe compile on Apache 2.3/2.4.
It certainly isn't going to work with Python 3.X. I am not even sure
for mod_wsgi what is going to happen with SWIG bindings for Apache API
under Python 3.X as don't know how SWIG works under that, but WSGI
parts of mod_wsgi already work on Python 3.X.

Anyway, I have been having a good think about some things at the
moment and maybe I might surprise you with some things to try. If you
can answer the one question for me about what you use stacked handlers
for and for which phases, will help with one thing I am looking at.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en.

Re: [modwsgi] Re: Converting from modpython's PythonAccessHandler to modwsgi

Reply via email to