How I Think Filters Should Work

Aaron Bannert Thu, 09 May 2002 17:01:42 -0700

> > That just sounds like the same thing with a blocking or non-blocking*
> > flag. To be honest, I don't see how any input filters would need anything
> > except one bucket at a time. If the filter doesn't need it, it passes
> > it downstream, otherwise it chugs and spits out other buckets. What else
> > is there?
> 
> Yuck.  I think it'd be possible for input filters to buffer up or
> modify data and then pass them up with multiple buckets in a
> brigade rather than one bucket.  Think of a mod_deflate input
> filter.  -- justin


Let me be more precise. I'm not saying that we shouldn't use
brigades. What I'm saying is we shouldn't be dealing with specific types
of data at this level. Right now, by requiring a filter to request
"bytes" or "lines", we are seriously constraining the performance of
the filters. A filter should only inspect the types of the buckets it
retrieves and then move on. The bytes should only come in to play once
we have actually retrieved a bucket of a certain type that we are able
to process.

Furthermore, we should be using a dynamic type system, and liberally
creating new bucket types as we invent new implementations. Filters need
not know which filters are upstream or downstream from them, but they
should have been strategically placed to consume certain buckets from
upstream filters and to produce certain buckets required by downstream
filters.


[Warning: long-winded brainstorm follows:]


I want a typical filter chain to look like this:

input_source  --->  protocol filters  -->  sub-protocol filters  --> handlers

an input socket would produce this:

SOCKET
EOS

an http header parser filter would produce these:

HEADER
HEADER
HEADER
DATA (extra data read past headers)
SOCKET
EOS

an http request parser would only work at the request level, performing
dechunking, dealing with content-length, and dealing with pipelined
requests. It would produce these:

BEGIN_OF_REQUEST
HEADERS
BEGIN_OF_BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
BODY_DATA
END_OF_BODY_DATA
TRAILERS...
END_OF_REQUEST
... and so on

a multipart input handler would then pass all types except BODY_DATA,
which it could use to produce:

...
MULTIPART_SECTION_BEGIN
BODY_DATA
MULTIPART_SECTION_END
...

or a magic mime filter could simply buffer enough BODY_DATA buckets until
it knew the type, prepending a MIME_TYPE to the front and sending
the whole thing downstream.

...
MIME_TYPE
BODY_DATA
BODY_DATA
...


The basic pattern for any input filter (which is pull-based at the moment
in Apache) would be the following:

1. retrieve next "abstract data unit"
2. inspect "abstract data unit", can we operate on it?
3. if yes, operate_on(unit) and pass the result to the next filter.
4. if no, pass the current unit to the next filter.
5. go to #1

In this model, the operate_on() behavior has been separated from the
mechanics of passing data around. I believe this would improve filter
performance as well as simplifying the implementation details that
module authors must understand. I also think this would dramatically
improve the extendability of the Apache Filters system.

[Sorry for the long brain dump. Some of these ideas have been floating
around in my head for a long time. When they become clear enough I will
write up a more formal and concise proposal on how I think the future
filter system should work (possible for 2.1 or beyond). I think the
apr-serf project is a perfect place to play with some of these ideas. I
would appreciate any constructive comments to the above. ]

-aaron

How I Think Filters Should Work

Reply via email to