Re: Ideas for Smart Filtering

Nick Kew Sun, 01 Aug 2004 02:55:08 -0700

On Sun, 1 Aug 2004, Justin Erenkrantz wrote:

> --On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew <[EMAIL PROTECTED]> wrote:
>
> >> I'm not sure what 'match' is in this context.
> >
> > In the above case, it could be "text/html" or "latin1".
> >   ap_register_smart_filter("transcode", "latin1", charset_filter, ctx,
> > flags);   ap_register_smart_filter("process", "text/html", html_filter, ctx,
> > flags);
> >
> > But that really needs the flexibility of a regexp, so "latin1" becomes
> >   "latin[-_]?1|iso[-_]?8859_?1"
> > or might expand to include other close relatives like iso-8859-15
>
> Having an overhead of regexp's by default in our filter code would seem to be
> a severe bottleneck.


Hmmm, how many configurations don't use any LocationMatch/family
containers nor AliasMatch or Rewrite rules?

But anyway, fair point.  Regex vs simple strcasecmp should be a flag.

>         I'd rather avoid that or push it on those few specific
> modules that want the power of regexp and willing to pay the ridiculous cost
> penalties.  The other significant thing you are missing in your API is what to
> match against.  (I think you are assuming Content-Type, but there's a lot of
> cases where you want to match against something other than Content-Type.)

That's part of the proposed configuration, when we declare the name for
the filter harness.

  FilterDeclare transcode AP_FTYPE_RESOURCE
  FilterDispatcher transcode Content-Type [charset=([^;]+)]
  FilterProvider transcode latin[-_]?1|iso[-_]?8859[-_]1 latin_1_filter
  FilterProvider transcode [other providers for other matches]

(that's maybe a bit contrived - I don't have a real-life case where we
want multiple filters other than on/off for different charsets)

(btw, if you think AP_FTYPE_RESOURCE should be AP_FTYPE_CONTENT_SET,
that's another weakness of the architecture.  If we need to transcode
*before* a content filter, then we can't use CONTENT_SET.
Solution: this needs to be configurable).

> Remember that the content-length doesn't even need to be set *before* we go
> into the filter.  (The fact that default_handler does it is more of an
> accident than anything else.)  The content-length header is *not* normative
> and should almost always be ignored.  (Of course, this is internally to httpd

Yes of course.

The point is that content-length *is* set by many handlers, and has to be
unset by filters.  The second point is that there *are* a bunch of bugs
arising from that (e.g. mod_deflate in 2.0.x vs recent fixes in 2.1-HEAD).
The KISS principle tells us that simplifying the task of filtering
content will reduce the bug count.

> and brigades.  It is not efficient to constantly compute the length as we push
> data through the filters.

No, but it is efficient simply to *unset* the length if we have one or
more filter that's going to change it.  Likewise, we need to handle
byteranges and Warning headers.  And unset a Last-Modified header when
a filter invalidates it (or make it configurable - c.f. XBitHack).

Instead of requiring every filter to worry about that, we let filters
simply declare their behaviour.

> So, if a filter is relying upon the content-length HTTP metadata header and
> not the brigades it sees, then it's severely broken.  Trying to restrict
> filters to pre-declare what they will do is, IMHO, silly and pointless.  I
> don't see how a solution for pre-declaring the intention of a filter is going
> to provide any real benefits.  Nothing can make use of that knowledge anyway
> because they have to account for all cases!  So, any benefit for corner-case
> optimization is lost by the increase in complexity just added.

No, the whole point is to *reduce* complexity!

-- 
Nick Kew

Re: Ideas for Smart Filtering

Reply via email to