Hi All,
As an access control module developer I feel compelled to weigh in here.
First, I would like to point out that you cannot determine whether a page
was password protected based on the Apache configuration files including
.htaccess. Modules such as the one we develop, which is widely deployed,
plug into the access hook and use more robust notions of access policies.
The only safe way you can determine what content is safe to cache is by
having a robust set of configurable rules in your caching module (or proxy)
that allows a site administrator to set rules that determine what is safe to
cache. These rules can be as simple as "cache these URLS" or as complex as
"cache all URLS that end with .gif when no cookies are sent inbound or set
outbound and when there is no cache control header telling us not to cache."
The caching done in mod_cache and in reverse proxy caches are different from
browser caches and forward proxy caches from an access control point of view
in that it is within the scope and responsibility of the enterprise that
owns the content being cached. To put it another way, mod_cache and rproxy
caches are responding to real requests made back to the server where as
browser caches and forward proxy caches are not.
I agree with Graham's notion that any hook designed primarily for caching
should be moved to after the access control hooks. I think the direct
performance impact will be minimal and easily made up for by the fact that
you will be able to cache more content since you will no longer be
constrained by whether or not the content is protected (only by whether or
not it is dynamic for each user/request).
Additionally, I'd like to point out that total reliance on cache control
headers is not a good option. Unfortunately, the reality is that most
dynamic web applications are unaware of these headers and incapable of
setting them or honoring them. At the same time, an access control module
may not want to set a do not cache header on every request because it might
be acceptable (as described above) for a browser cache to cache a piece of
content but not for mod_cache or a reverse proxy cache. Enterprises must
strike a delicate balance here. This balance is usually reached by using
cache control headers to communicate with the end user and use other
intelligent cooperation for the enterprises internal caching mechanisms.
Stepping back a little to discuss the hook in general rather than mod_cache,
I agree strongly with Ryan that it is dangerous and to some extent
conceptually flawed. A well designed API that uses callbacks should allow
people to plug in to do the work that was designed to happen in that hook
without having to worry too much about whether or not some other module that
plugs in somewhere else is going to circumvent that hook. Even if the
quick_handler hook is being used appropriately and safely by one module,
that does not justify the power given to it.
thanks,
Brian
Brian Eidelman
Netegrity Inc.
-----Original Message-----
From: Bill Stoddard [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, July 31, 2002 8:10 AM
To: [EMAIL PROTECTED]
Subject: RE: quick_handler hook is completely bogus.
> Ryan Bloom wrote:
>
> > 1) If I have a page that I have served and it gets put in the cache,
> > then it will be served out of the quick_handler phase. However, if I
> > then add or modify a .htaccess file to deny access to that page, then my
> > changes won't be honored until the page expires from the cache.
>
> True, and from any other cache along the chain between browser and server.
>
> > This is
> > a security hole, because I don't know of anyway to invalidate cached
> > pages.
>
> Sort of, leaning towards yes. I don't believe it's a hole as such
> (because behaviour is consistent with any number of caches) however it
> does violate the principle of least astonishment - when Apache's config
> changes, Apache should be serving pages that correspond to that config
> immediately, not later after cache expiry.
>
> > 2) If I have a page that uses access checking to ensure that only
> > certain people can request the page, the cache_filter will put it in the
> > quick handler. However, the page may not be allowed to people who will
> > request it from the cache. I may be wrong about this one, but I see how
> > the cache disallows pages that require authentication. I do not see how
> > it can disallow caching of pages that require access_checking.
>
> Does the cache store pages protected by access control? If it does, the
> cache should respond to Cache-Control: private headers set by the auth
> modules (does Apache set these headers correctly?), OR - each user's
> page could be considered a variant of the URL, which means the page gets
> cached, but a version for each user, guaranteeing that user A does not
> see pages from user B, ever.
>
> Hmmm... this could be a good thing for people who have password
> protected websites that are expensive to generate. The cache could still
> cache password protected stuff, but safely.
mod_cache does not currently cache password protected content, though it
could be made to relatively easy by defining a handler for mod_cache and
serving password protected content from the handler. This is a feature I
intend to implement.
>
> > 3) It isn't possible for a module author to circumvent the
> > quick_handler phase. If I write a module that doesn't want to allow the
> > quick_handler phase, for security reasons, I can't enforce it.
>
> True.
That is true of all the other hooks as well. It appears you are judging the
quick_handler much more harshly that the other hooks which can just as
easily be abused to the detriment of the server security. Use the hook
correctly and there are no security exposures. I think we could easily
modify the server to enable quick_handler to return one of three or four
values DONE, DECLINED, OK or an HTTP error. Returning DONE would allow the
server to ignore calling other module's quick_handlers (if there are any).
>
> > While I understand that we are giving people a lot of rope and asking
> > them to use it wisely, this phase gives too much rope, and invites
> > people to hang themselves.
>
> I think quick_handler should not be removed, but rather moved to before
> the handler, but after the auth.
I strongly disagree. The whole idea of the quick_handler is to bypass
location_walk, directory_walk, access and auth checking. The overhead for
all these steps adds up. I also have a prototype ESI fragment
cache/assembler that relies on the quick handler being right where it is in
the request cycle.
Bill