On Sun, 1 Aug 2004, Justin Erenkrantz wrote:
> --On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew <[EMAIL PROTECTED]> wrote:
>
> >> I'm not sure what 'match' is in this context.
> >
> > In the above case, it could be "text/html" or "latin1".
> > ap_register_smart_filter("transcode", "latin1", charset_filter, ctx,
> > flags); ap_register_smart_filter("process", "text/html", html_filter, ctx,
> > flags);
> >
> > But that really needs the flexibility of a regexp, so "latin1" becomes
> > "latin[-_]?1|iso[-_]?8859_?1"
> > or might expand to include other close relatives like iso-8859-15
>
> Having an overhead of regexp's by default in our filter code would seem to be
> a severe bottleneck.
Hmmm, how many configurations don't use any LocationMatch/family
containers nor AliasMatch or Rewrite rules?
But anyway, fair point. Regex vs simple strcasecmp should be a flag.
> I'd rather avoid that or push it on those few specific
> modules that want the power of regexp and willing to pay the ridiculous cost
> penalties. The other significant thing you are missing in your API is what to
> match against. (I think you are assuming Content-Type, but there's a lot of
> cases where you want to match against something other than Content-Type.)
That's part of the proposed configuration, when we declare the name for
the filter harness.
FilterDeclare transcode AP_FTYPE_RESOURCE
FilterDispatcher transcode Content-Type [charset=([^;]+)]
FilterProvider transcode latin[-_]?1|iso[-_]?8859[-_]1 latin_1_filter
FilterProvider transcode [other providers for other matches]
(that's maybe a bit contrived - I don't have a real-life case where we
want multiple filters other than on/off for different charsets)
(btw, if you think AP_FTYPE_RESOURCE should be AP_FTYPE_CONTENT_SET,
that's another weakness of the architecture. If we need to transcode
*before* a content filter, then we can't use CONTENT_SET.
Solution: this needs to be configurable).
> Remember that the content-length doesn't even need to be set *before* we go
> into the filter. (The fact that default_handler does it is more of an
> accident than anything else.) The content-length header is *not* normative
> and should almost always be ignored. (Of course, this is internally to httpd
Yes of course.
The point is that content-length *is* set by many handlers, and has to be
unset by filters. The second point is that there *are* a bunch of bugs
arising from that (e.g. mod_deflate in 2.0.x vs recent fixes in 2.1-HEAD).
The KISS principle tells us that simplifying the task of filtering
content will reduce the bug count.
> and brigades. It is not efficient to constantly compute the length as we push
> data through the filters.
No, but it is efficient simply to *unset* the length if we have one or
more filter that's going to change it. Likewise, we need to handle
byteranges and Warning headers. And unset a Last-Modified header when
a filter invalidates it (or make it configurable - c.f. XBitHack).
Instead of requiring every filter to worry about that, we let filters
simply declare their behaviour.
> So, if a filter is relying upon the content-length HTTP metadata header and
> not the brigades it sees, then it's severely broken. Trying to restrict
> filters to pre-declare what they will do is, IMHO, silly and pointless. I
> don't see how a solution for pre-declaring the intention of a filter is going
> to provide any real benefits. Nothing can make use of that knowledge anyway
> because they have to account for all cases! So, any benefit for corner-case
> optimization is lost by the increase in complexity just added.
No, the whole point is to *reduce* complexity!
--
Nick Kew