I answered my own question by implementing it and failing.  You can't bypass
mod_authz_host because it gets invoked via the magic macro:

  AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker,
                          (request_rec *r), (r), OK, DECLINED)

This means that returning OK from my handler does not prevent
mod_authz_host's handler from being called.

I came up with a simpler idea that does not require depending on
string-literals in mod_rewrite.c.

I still add a translate_name hook to run prior to mod_rewrite, but I don't
try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel
away the uncorrupted URL in my own entry in request->notes so that I can use
that rather than request->unparsed_uri downstream when processing the
request.  This seems to work well.  The only drawback is if the site admin
adds a mod_rewrite rule that mutates mod_pagespeed's resource name into
something that does not pass authentication, then mod_authz_host will reject
the request before I can process it.  This seems like a reasonable tradeoff
as that configuration would likely be borked in other ways besides
mod_pagespeed resources.

Commentary would be welcome.

-Josh

On Mon, Jan 3, 2011 at 1:10 PM, Joshua Marantz <jmara...@google.com> wrote:

> I have implemented Ben's hack in mod_pagespeed in
> http://code.google.com/p/modpagespeed/source/detail?r=345 .  It works
> great.  But I am concerned that a subtle change to mod_rewrite.c will break
> this hack silently.  We would catch it in our regression tests, but the
> large number of Apache users that have downloaded mod_pagespeed do not
> generally run our regression tests.
>
> I have another idea for a solution that I'd like to see opinions on.
> Looking at Nick Kew's book, it seems like I could set request->filename to
> whatever I wanted, return OK, but then also shunt off access_checker for my
> rewritten resources.  The access checking on mod_pagespeed resources is
> redundant, because the resource will either be served from cache (in which
> case it had to be authenticated to get into the cache in the first place) or
> will be decoded and the original resource(s) fetched from the same server
> with full authentication.
>
> I'd appreciate any comments on this approach.
>
> -Josh
>
>
> On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz <jmara...@google.com>wrote:
>
>> OK I tried to find a more robust alternative but could not.  I was
>> thinking I could duplicate whatever mod_rewrite was doing to set the request
>> filename that appears to be complex and probably no less brittle.
>>
>> I have another query on this.  In reality we do *not* want our rewritten
>> resources to be associated with a filename at all.  Apache should never look
>> for such things in the file system under ../htdocs -- they will not be
>> there.  We also do not need it to validate or authenticate on these static
>> resources.
>>
>> In particular, we have found that there is some path through Apache that
>> imposes what looks like a file-system-based limitation on URL segments (e.g.
>> around 256 bytes).  This limitation is inconvenient and, as far as I can
>> tell, superfluous.  URL limits imposed by proxies and browsers are more like
>> 2k bytes, which would allow us to encode more metadata in URLs (e.g.
>> sprites).  Is there some magic setting we could put into the request
>> structure to tell Apache not to interpret the request as being mapped from a
>> file, but just to pass it through to our handler?
>>
>> Thanks!
>> -Josh
>>
>> On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis <i...@bnoordhuis.nl> wrote:
>>
>>> On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jmara...@google.com>
>>> wrote:
>>> > Thanks for the quick response and the promising idea for a hack.
>>>  Looking at
>>> > mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
>>> > fragile, as mod_rewrite.c doesn't expose that string-constant in any
>>> formal
>>> > interface (even as a #define in a .h).  Nevertheless the solution is
>>> > easy-to-implement and easy-to-test, so...thanks!
>>>
>>> You're welcome, Joshua. :)
>>>
>>> You could try persuading a core committer to add this as a
>>> (semi-)official extension. Nick Kew reads this list, Paul Querna often
>>> idles in #node.js at freenode.net.
>>>
>>> > I'm also still wondering if there's a good source of official
>>> documentation
>>> > for the detailed semantics of interfaces like ap_hook_translate_name.
>>> >  Neither a Google Search, a  stackoverflow.com search, nor the Apache
>>> > Modules<
>>> http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1
>>> >book
>>> > offer much detail.
>>> > code.google.com fares a little better but just points to 4 existing
>>> usages.
>>>
>>> This question comes up often. In my experience the online
>>> documentation is almost always outdated, incomplete or outright wrong.
>>> I don't bother looking things up, I go straight to the source.
>>>
>>> It's a kind of job security, I suppose. There are only a handful of
>>> people that truly and deeply understand Apache. We can ask any hourly
>>> rate we want!
>>>
>>
>>
>

Reply via email to