On Sun, 1 Aug 2004, Justin Erenkrantz wrote: > --On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew <[EMAIL PROTECTED]> wrote: > > >> I'm not sure what 'match' is in this context. > > > > In the above case, it could be "text/html" or "latin1". > > ap_register_smart_filter("transcode", "latin1", charset_filter, ctx, > > flags); ap_register_smart_filter("process", "text/html", html_filter, ctx, > > flags); > > > > But that really needs the flexibility of a regexp, so "latin1" becomes > > "latin[-_]?1|iso[-_]?8859_?1" > > or might expand to include other close relatives like iso-8859-15 > > Having an overhead of regexp's by default in our filter code would seem to be > a severe bottleneck.
Hmmm, how many configurations don't use any LocationMatch/family containers nor AliasMatch or Rewrite rules? But anyway, fair point. Regex vs simple strcasecmp should be a flag. > I'd rather avoid that or push it on those few specific > modules that want the power of regexp and willing to pay the ridiculous cost > penalties. The other significant thing you are missing in your API is what to > match against. (I think you are assuming Content-Type, but there's a lot of > cases where you want to match against something other than Content-Type.) That's part of the proposed configuration, when we declare the name for the filter harness. FilterDeclare transcode AP_FTYPE_RESOURCE FilterDispatcher transcode Content-Type [charset=([^;]+)] FilterProvider transcode latin[-_]?1|iso[-_]?8859[-_]1 latin_1_filter FilterProvider transcode [other providers for other matches] (that's maybe a bit contrived - I don't have a real-life case where we want multiple filters other than on/off for different charsets) (btw, if you think AP_FTYPE_RESOURCE should be AP_FTYPE_CONTENT_SET, that's another weakness of the architecture. If we need to transcode *before* a content filter, then we can't use CONTENT_SET. Solution: this needs to be configurable). > Remember that the content-length doesn't even need to be set *before* we go > into the filter. (The fact that default_handler does it is more of an > accident than anything else.) The content-length header is *not* normative > and should almost always be ignored. (Of course, this is internally to httpd Yes of course. The point is that content-length *is* set by many handlers, and has to be unset by filters. The second point is that there *are* a bunch of bugs arising from that (e.g. mod_deflate in 2.0.x vs recent fixes in 2.1-HEAD). The KISS principle tells us that simplifying the task of filtering content will reduce the bug count. > and brigades. It is not efficient to constantly compute the length as we push > data through the filters. No, but it is efficient simply to *unset* the length if we have one or more filter that's going to change it. Likewise, we need to handle byteranges and Warning headers. And unset a Last-Modified header when a filter invalidates it (or make it configurable - c.f. XBitHack). Instead of requiring every filter to worry about that, we let filters simply declare their behaviour. > So, if a filter is relying upon the content-length HTTP metadata header and > not the brigades it sees, then it's severely broken. Trying to restrict > filters to pre-declare what they will do is, IMHO, silly and pointless. I > don't see how a solution for pre-declaring the intention of a filter is going > to provide any real benefits. Nothing can make use of that knowledge anyway > because they have to account for all cases! So, any benefit for corner-case > optimization is lost by the increase in complexity just added. No, the whole point is to *reduce* complexity! -- Nick Kew