I answered my own question by implementing it and failing. You can't bypass mod_authz_host because it gets invoked via the magic macro:
AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker, (request_rec *r), (r), OK, DECLINED) This means that returning OK from my handler does not prevent mod_authz_host's handler from being called. I came up with a simpler idea that does not require depending on string-literals in mod_rewrite.c. I still add a translate_name hook to run prior to mod_rewrite, but I don't try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel away the uncorrupted URL in my own entry in request->notes so that I can use that rather than request->unparsed_uri downstream when processing the request. This seems to work well. The only drawback is if the site admin adds a mod_rewrite rule that mutates mod_pagespeed's resource name into something that does not pass authentication, then mod_authz_host will reject the request before I can process it. This seems like a reasonable tradeoff as that configuration would likely be borked in other ways besides mod_pagespeed resources. Commentary would be welcome. -Josh On Mon, Jan 3, 2011 at 1:10 PM, Joshua Marantz <jmara...@google.com> wrote: > I have implemented Ben's hack in mod_pagespeed in > http://code.google.com/p/modpagespeed/source/detail?r=345 . It works > great. But I am concerned that a subtle change to mod_rewrite.c will break > this hack silently. We would catch it in our regression tests, but the > large number of Apache users that have downloaded mod_pagespeed do not > generally run our regression tests. > > I have another idea for a solution that I'd like to see opinions on. > Looking at Nick Kew's book, it seems like I could set request->filename to > whatever I wanted, return OK, but then also shunt off access_checker for my > rewritten resources. The access checking on mod_pagespeed resources is > redundant, because the resource will either be served from cache (in which > case it had to be authenticated to get into the cache in the first place) or > will be decoded and the original resource(s) fetched from the same server > with full authentication. > > I'd appreciate any comments on this approach. > > -Josh > > > On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz <jmara...@google.com>wrote: > >> OK I tried to find a more robust alternative but could not. I was >> thinking I could duplicate whatever mod_rewrite was doing to set the request >> filename that appears to be complex and probably no less brittle. >> >> I have another query on this. In reality we do *not* want our rewritten >> resources to be associated with a filename at all. Apache should never look >> for such things in the file system under ../htdocs -- they will not be >> there. We also do not need it to validate or authenticate on these static >> resources. >> >> In particular, we have found that there is some path through Apache that >> imposes what looks like a file-system-based limitation on URL segments (e.g. >> around 256 bytes). This limitation is inconvenient and, as far as I can >> tell, superfluous. URL limits imposed by proxies and browsers are more like >> 2k bytes, which would allow us to encode more metadata in URLs (e.g. >> sprites). Is there some magic setting we could put into the request >> structure to tell Apache not to interpret the request as being mapped from a >> file, but just to pass it through to our handler? >> >> Thanks! >> -Josh >> >> On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis <i...@bnoordhuis.nl> wrote: >> >>> On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jmara...@google.com> >>> wrote: >>> > Thanks for the quick response and the promising idea for a hack. >>> Looking at >>> > mod_rewrite.c this does indeed look a lot more surgical, if, perhaps, >>> > fragile, as mod_rewrite.c doesn't expose that string-constant in any >>> formal >>> > interface (even as a #define in a .h). Nevertheless the solution is >>> > easy-to-implement and easy-to-test, so...thanks! >>> >>> You're welcome, Joshua. :) >>> >>> You could try persuading a core committer to add this as a >>> (semi-)official extension. Nick Kew reads this list, Paul Querna often >>> idles in #node.js at freenode.net. >>> >>> > I'm also still wondering if there's a good source of official >>> documentation >>> > for the detailed semantics of interfaces like ap_hook_translate_name. >>> > Neither a Google Search, a stackoverflow.com search, nor the Apache >>> > Modules< >>> http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1 >>> >book >>> > offer much detail. >>> > code.google.com fares a little better but just points to 4 existing >>> usages. >>> >>> This question comes up often. In my experience the online >>> documentation is almost always outdated, incomplete or outright wrong. >>> I don't bother looking things up, I go straight to the source. >>> >>> It's a kind of job security, I suppose. There are only a handful of >>> people that truly and deeply understand Apache. We can ask any hourly >>> rate we want! >>> >> >> >