> > That just sounds like the same thing with a blocking or non-blocking* > > flag. To be honest, I don't see how any input filters would need anything > > except one bucket at a time. If the filter doesn't need it, it passes > > it downstream, otherwise it chugs and spits out other buckets. What else > > is there? > > Yuck. I think it'd be possible for input filters to buffer up or > modify data and then pass them up with multiple buckets in a > brigade rather than one bucket. Think of a mod_deflate input > filter. -- justin
Let me be more precise. I'm not saying that we shouldn't use brigades. What I'm saying is we shouldn't be dealing with specific types of data at this level. Right now, by requiring a filter to request "bytes" or "lines", we are seriously constraining the performance of the filters. A filter should only inspect the types of the buckets it retrieves and then move on. The bytes should only come in to play once we have actually retrieved a bucket of a certain type that we are able to process. Furthermore, we should be using a dynamic type system, and liberally creating new bucket types as we invent new implementations. Filters need not know which filters are upstream or downstream from them, but they should have been strategically placed to consume certain buckets from upstream filters and to produce certain buckets required by downstream filters. [Warning: long-winded brainstorm follows:] I want a typical filter chain to look like this: input_source ---> protocol filters --> sub-protocol filters --> handlers an input socket would produce this: SOCKET EOS an http header parser filter would produce these: HEADER HEADER HEADER DATA (extra data read past headers) SOCKET EOS an http request parser would only work at the request level, performing dechunking, dealing with content-length, and dealing with pipelined requests. It would produce these: BEGIN_OF_REQUEST HEADERS BEGIN_OF_BODY_DATA BODY_DATA BODY_DATA BODY_DATA BODY_DATA END_OF_BODY_DATA TRAILERS... END_OF_REQUEST ... and so on a multipart input handler would then pass all types except BODY_DATA, which it could use to produce: ... MULTIPART_SECTION_BEGIN BODY_DATA MULTIPART_SECTION_END ... or a magic mime filter could simply buffer enough BODY_DATA buckets until it knew the type, prepending a MIME_TYPE to the front and sending the whole thing downstream. ... MIME_TYPE BODY_DATA BODY_DATA ... The basic pattern for any input filter (which is pull-based at the moment in Apache) would be the following: 1. retrieve next "abstract data unit" 2. inspect "abstract data unit", can we operate on it? 3. if yes, operate_on(unit) and pass the result to the next filter. 4. if no, pass the current unit to the next filter. 5. go to #1 In this model, the operate_on() behavior has been separated from the mechanics of passing data around. I believe this would improve filter performance as well as simplifying the implementation details that module authors must understand. I also think this would dramatically improve the extendability of the Apache Filters system. [Sorry for the long brain dump. Some of these ideas have been floating around in my head for a long time. When they become clear enough I will write up a more formal and concise proposal on how I think the future filter system should work (possible for 2.1 or beyond). I think the apr-serf project is a perfect place to play with some of these ideas. I would appreciate any constructive comments to the above. ] -aaron