On 15-Apr-05, at 4:55 AM, Nick Kew wrote:
Rici Lake wrote:I was taking a look at the implementation of the renamed (but still misleading) AP_MODE_EATCRLF, in an attempt to figure out what a conforming input filter ought to do if it's handed that mode.
I think that really depends on your filter. I have two strategies there (setting aside simply ignoring the mode, which is wrong but works if you know where the filter's being used).
* If the filter is happy to work with lines, get data from the next filter using the same mode and process it that way. * Otherwise, assume the filter is unwanted, and remove itself from the filter chain.
Perhaps I wasn't being clear. AP_MODE_EATCRLF is not a renaming of AP_MODE_GETLINE; it is a renaming of AP_MODE_PEEK. (The renaming happened some three years ago in revision 92928, obviously I'm not al tanto.) To be more precise, it's a renaming of what the behaviour of AP_MODE_PEEK was.
The documented behaviour of AP_MODE_EATCRLF is: /** The filter should implicitly eat any CRLF pairs that it sees. */
The actual behaviour (at least in the only implementation of it in the tree) is somewhat more complex:
1) Regardless of the readtype, it only reads non-blocking
2) Similar to MODE_SPECULATIVE, it retains data for subsequent reads (but see below)
3) It does not, however, ever return any data.
4) It returns APR_SUCCESS only if data is immediately available. In determining whether or not data is immediately available, it ignores ([CR]?[LF})+ (but with a bug as noted in my first message)
5) If the only data immediately available is ignored line-endings as per (4), these line endings are not retained for subsequent reads. (Obviously, this cannot be relied upon to delete leading CR?LF's, since the scan only applies to currently available data.)
My argument is that this is extremely idiosyncratic behaviour and has no place in a reasonable interface design. It seems to me that the only consumer of AP_MODE_EATCRLF could just as well use AP_MODE_SPECULATIVE (which is implemented by the only provider which implements AP_MODE_EATCRLF) to do what it wants to do. Consequently, AP_MODE_EATCRLF could just be honourably dispatched to the retirement home for awkward interfaces, and imho we'd all be better off.
The following (as-yet-untested) code should do the trick. It is somewhat longer than the existing check_pipeline_flush but that needs to be balanced against the removal of an equal-sized chunk of server/core_filters.c and the end of AP_MODE_EATCRLF:
static void check_pipeline_flush(request rec *r)
{
apr_bucket *e
apr_bucket_brigade *bb;
conn_rec *c = r->connection;
bb = apr_brigade_create(r->pool, c->bucket_alloc);
if (c->keepalive != AP_CONN_CLOSE) {
/* The 8 is entirely arbitrary. 4 is probably sufficient */
if (ap_get_brigade(r->input_filters, bb, AP_MODE_SPECULATIVE,
APR_NONBLOCK_READ, 8) == APR_SUCCESS) {
const char *buf;
apr_size_t len, i;
int gotcr = 0;
int gotdata = 0;
for (e = APR_BRIGADE_FIRST(bb);
e != APR_BRIGADE_SENTINEL(bb);
e == APR_BUCKET_NEXT(e)) {
if (APR_BUCKET_IS_METADATA(e)
|| apr_bucket_read(e, &buf, &len,
APR_NONBLOCK_READ) != APR_SUCCESS) {
break;
}
for (i = 0; i < len; ++i) {
if (buf[i] == AP_ASCII_LF)
gotcr = 0;
else if (gotcr || buf[i] != AP_ASCII_CR) {
gotdata = 1;
break;
}
else
gotcr = 1;
}
if (gotdata) break;
}
if (gotdata) {
c->data_in_input_filters = 1;
apr_brigade_destroy(bb); /* possibly unnecessary */
return;
}
}
}
c->data_in_input_filters = 0;
e = apr_bucket_flush_create(c->bucket_alloc);
APR_BRIGADE_INSERT_HEAD(bb, e);
ap_pass_brigade(c->output_filters, bb);
}
Agreed, with the proviso that it doesn't really complicate things that much, as a filter can uninsert itself.
If the read mode is line-oriented, the filter really should handle it rather than uninserting itself, unless the filter believes itself to be filtering non-textual information.
Maybe we should have a declarative version of this, in the manner of mod_filter's protocol handling for output filters. So that any input filter passes an OR of input modes it works in to util_filter, and never gets inserted when called in a mode it doesn't want to support?
Perhaps, but you don't know at the point of insertion what read mode will be used. I would prefer to be forced to handle all read modes on the understanding that these modes are all reasonable, documented and that there are not too many of them :)
Suppose that GETLINE was not implemented by a filter but that SPECULATIVE was. In that case, the consumer could implement GETLINE itself by doing a speculative get_brigade of the maximum linelength; finding the line ending if any, and then doing a normal get_brigade of the number of bytes to the line-ending (thereby leaving the remaining bytes in the downstream filter's speculative setaside buffer).
It would probably be desirable to avoid doing a blocking read beyond the line-ending, so this could be improved by doing a non-blocking speculative get_brigade and then, if no line-end is detected doing a blocking get_brigade of one more than the number of bytes returned by the non-blocking speculative get_brigade (and then repeat as necessary).
That could all be done by a provided utility function. GETLINE mode is really just an optimisation around the setaside in the above algorithm, so it could be made 100% optional for an input filter; the getline utility could try the GETLINE mode get_brigade and, if it received a not-implemented indication from the filter, fall back to the above algorithm; the caller of the getline utility could remain blissfully unaware and the only complexity added to the implementer of a naive input filter would be returning NOTIMPL if it gets a mode which it can't handle.
I presume from the comment in the definition of SPECULATIVE mode that something like this was always in the plans.
