On 21 Nov 2013, at 10:43 AM, Joe Orton <[email protected]> wrote: > Those rules are written (explicitly) for resource-level filters. They > would have to be a little different for CONNECTION level, e.g. EOS > handling should probably be different... though I'm not sure how we'd > write the rule. Probably connection filters should delete or pass EOS, > since they don't (shouldn't) care about request boundaries. Dunno. > > mod_ssl's output filter goes to some effort to apply (10) consistently, > and is agnostic to data bucket types (5), but yeah, the metadata > handling is looks wrong as you say.
Having looked at this is a little deeper, having implemented the initial pass at nonblocking support for the core, and then trying to implement how mod_ssl might use this, it has uncovered another snag - filters right now assume that APR_EAGAIN is a fatal error and leave immediately, when in reality we might be half way through transforming and writing a brigade. Right now, it seems that filters have the property that they are obliged to consume the brigade they have been given. What this means practically is that if mod_ssl is given a 4GB file bucket, it is obliged to consume the entire 4GB file bucket and ensure it is written downstream before returning, and that breaks write completion. Ideally, the core filter should be able to return APR_EAGAIN and mod_ssl should "do the right thing" to set aside the remainder before passing back the "error" to the caller, but it doesn't do that, and neither does mod_deflate and a host of other external filters. Ideally we need a new type of output filter, an asynchronous output filter, which in addition to the ten rules have these rules: - Empty brigades must be passed upstream and not ignored like now. This is so that an upstream filter with setaside data has the chance to keep writing it, without messing about with metadata buckets or other performance sapping stuff. - Async filters must understand and react to APR_EAGAIN from upstream, APR_EAGAIN is not an error, instead it is a request to call us again when we are writable, possibly with an empty brigade. We must setaside what we have so far unwritten (probably using functionality buried today in the core output filter, suitably reference counting the c->data_in_output_filters). - The core output filter becomes non blocking by default, if we want blocking behaviour we must explicitly send a flush bucket. While it would be nice to pass a "block/nonblock" flag like we do with the input filter stack this competes with the flush buckets, which effectively mean "block please". Defaulting to "nonblock until explicitly told otherwise" is nice and simple. Given that the world must be given an opportunity to migrate to async filters, we would need to support both at the same time. What I am thinking is that a filter registers itself either as a legacy synchronous filter (you'll be given a brigade, don't return until all the brigade is consumed), or an asynchronous filter (feel free to return APR_EAGAIN at any time, and setaside data if you receive APR_EAGAIN from upstream), you state explicitly which. The key bit will be the transition from an upstream async filter to an earlier sync filter, APR_EAGAIN will need to be trapped and responded to by sending a flush bucket up the stack, effectively allowing us to convert APR_EAGAIN into APR_SUCCESS which can then be safely received by the earlier sync filter. This is a simple task that could be accomplished inside ap_pass_brigade(). The event MPM would simply need to pass empty brigades to c->output_filters until c->data_in_output_filters returns to zero. The addition of a sync filter to the c->output_filters would cause write_completion to stop working, but it will still be functional until the sync filter became async. We currently have this situation where we keep track of the start of the protocol filter stack in c->output_filters instead of the real start of the stack, we could for now insert a helper protocol filter that runs first and waits until EOS is received meaning "synchronous processing is done", before setting aside the data and returning APR_SUCCESS, inviting event to enter write completion in the hope that everything from this filter onwards is async and therefore eligible for write completion. This filter can also be responsible for flow control, vastly simplifying the core_output_filter we have now. All of this is still backportable to v2.4. Regards, Graham --
