http://people.apache.org/~jorton/output-filters.html
How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. I'd also like to add a simple buffering filter which "does things right" and can be used as a reference; all other in-tree filters are either too complicated (filters/*, http/* etc) or too awful (experimental/*). Any objections? Regards, joe
Index: docs/manual/developer/output-filters.xml =================================================================== --- docs/manual/developer/output-filters.xml (revision 0) +++ docs/manual/developer/output-filters.xml (revision 0) @@ -0,0 +1,457 @@ +<?xml version="1.0" encoding="UTF-8" ?> +<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd"> +<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?> + +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> + +<manualpage metafile="output-filters.xml.meta"> + <parentdocument href="./">Developer Documentation</parentdocument> + + <title>Guide to writing output filters</title> + + <summary> + <p>There are a number of common pitfalls encountered when writing + output filters; this page aims to document best practice for + authors of new or existing filters.</p> + + <p>This document is applicable to both version 2.0 and version 2.2 + of the Apache HTTP Server; it specifically targets + <code>RESOURCE</code>-level or <code>CONTENT_SET</code>-level + filters though some advice is generic to all types of filter.</p> + </summary> + + <section id="basics"> + <title>Filters and bucket brigades</title> + + <p>Each time a filter is invoked, it is passed a <em>bucket + brigade</em>, containing a sequence of <em>buckets</em> which + represent both data content and metadata. Every bucket has a + <em>bucket type</em>; a number of bucket types are defined and + used by the <code>httpd</code> core modules (and the + <code>apr-util</code> library which provides the bucket brigade + interface), but modules are free to define their own types.</p> + + <note type="hint">Output filters must be prepared to process + buckets of non-standard types; with a few exceptions, a filter + need not care about the types of buckets being filtered.</note> + + <p>A filter can tell whether a bucket represents either data or + metadata using the <code>APR_BUCKET_IS_METADATA</code> macro. + Generally, all metadata buckets should be passed up the filter + chain by an output filter. Filters may transform, delete, and + insert data buckets as appropriate.</p> + + <p>There are two metadata bucket types which all filters must pay + attention to: the <code>EOS</code> bucket type, and the + <code>FLUSH</code> bucket type. An <code>EOS</code> bucket + indicates that the end of the response has been reached and no + further buckets need be processed. A <code>FLUSH</code> bucket + indicates that the filter should flush any buffered buckets (if + applicable) down the filter chain immediately.</p> + + <note type="hint"><code>FLUSH</code> buckets are sent when the + content generator (or a downstream filter) knows that there may be + a delay before more content can be sent. By passing + <code>FLUSH</code> buckets up the filter chain immediately, + filters ensure that the client is not kept waiting for pending + data longer than necessary.</note> + + <p>Filters can create <code>FLUSH</code> buckets and pass these up + the filter chain if desired. Generating <code>FLUSH</code> + buckets unnecessarily, or too frequently, can harm network + utilisation since it may force large numbers of small packets to + be sent, rather than a small number of larger packets. The + section on <a href="#nonblock">Non-blocking bucket reads</a> + covers a case where filters are encouraged to generate + <code>FLUSH</code> buckets.</p> + + <example><title>Example bucket brigade</title> + <pre>HEAP FLUSH FILE EOS</pre></example> + + <p>This shows a bucket brigade which may be passed to a filter; it + contains two metadata buckets (<code>FLUSH</code> and + <code>EOS</code>), and two data buckets (<code>HEAP</code> and + <code>FILE</code>).</p> + + </section> + + <section id="invocation"> + <title>Filter invocation</title> + + <p>For any given request, an output filter might be invoked only + once and given a single brigade representing the entire response. + It is also possible that the number of times a filter is invoked + is proportional to the size of the content being filtered, with + the filter being passed a brigade containing a single bucket each + time. Filters must operate correctly in either case.</p> + + <note type="warning">An output filter which allocates long-lived + memory every time it is invoked may consume memory proportional to + response size. Output filters which need to allocate memory + should do so once per response; see <a href="#state">Maintaining + state</a> below.</note> + + <p>An output filter can determine the final invocation for a given + response by the presence of an <code>EOS</code> bucket in the + brigade. Any buckets in the brigade after an EOS should be + ignored.</p> + + <p>An output filter should never pass an empty brigade up the + filter chain. But, for good defensive programming, filters should + be prepared to accept an empty brigade, and do nothing.</p> + + <example><title>How to handle an empty brigade</title> + + <pre>apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb) +{ + if (APR_BRIGADE_EMPTY(bb)) { + return APR_SUCCESS; + } + ....</pre></example> + + </section> + + <section id="brigade"> + <title>Brigade structure</title> + + <p>A bucket brigade is a doubly-linked list of buckets. The list + is terminated (at both ends) by a <em>sentinel</em> which can be + distinguished from a normal bucket by comparing it with the + pointer returned by <code>APR_BRIGADE_SENTINEL</code>. The list + sentinel is in fact not a valid bucket structure; any attempt to + call normal bucket functions (such as + <code>apr_bucket_read</code>) on the sentinel will have undefined + behaviour (i.e. will crash the process).</p> + + <p>There are a variety of functions and macros for traversing and + manipulating bucket brigades; see the <a + href="http://apr.apache.org/docs/apr-util/trunk/group___a_p_r___util___bucket___brigades.html">apr_bucket.h</a> + header for complete coverage. Commonly used macros include: + + <dl> + <dt><code>APR_BRIGADE_FIRST(bb)</code></dt> + <dd>returns the first bucket in brigade bb</dd> + + <dt><code>APR_BRIGADE_LAST(bb)</code></dt> + <dd>returns the last bucket in brigade bb</dd> + + <dt><code>APR_BUCKET_NEXT(e)</code></dt> + <dd>gives the next bucket after bucket e</dd> + + <dt><code>APR_BUCKET_PREV(e)</code></dt> + <dd>gives the bucket before bucket e</dd> + + </dl></p> + + <p>The <code>apr_bucket_brigade</code> structure itself is + allocated out of a pool, so if a filter creates a new brigade, it + must ensure that memory use is correctly bounded. A filter which + allocates a new brigade out of the request pool + (<code>r->pool</code>) on every invocation, for example, will fall + foul of the <a href="#invocation">warning above</a> concerning + memory use. Such a filter should instead create a brigade on the + first invocation per request, and store that brigade in its <a + href="#state">state structure</a>.</p> + + <note type="warning">It is generally never advisable to use + <code>apr_brigade_destroy</code> to "destroy" a brigade. The + memory used by the brigade structure will not be released by + calling this function (since it comes from a pool), but the + associated pool cleanup is unregistered. Using + <code>apr_brigade_destroy</code> can in fact cause memory leaks; + if a "destroyed" brigade contains still contains buckets when its + containing pool is destroyed, those buckets will <em>not</em> be + immediately destroyed.</note> + + </section> + + <section id="buckets"> + + <title>Processing buckets</title> + + <p>When dealing with non-metadata buckets, it is important to + understand that the "<code>apr_bucket *</code>" object is an + abstract <em>representation</em> of data: + + <ol> + <li>The amount of data represented by the bucket may or may not + have a determinate length; for a bucket which represents data of + indeterminate length, the <code>->length</code> field is set to + the value <code>(apr_size_t)-1</code>. The <code>PIPE</code> + bucket type is an example of a bucket type has an indeterminate + length; it represents the output from a pipe, .</li> + + <li>The data represented by a bucket may or may not be mapped + into memory. The <code>FILE</code> bucket type, for example, + represents data stored in a file on disk.</li> + </ol> + + Filters read the data from a bucket using the + <code>apr_bucket_read</code> function. When this function is + invoked, the bucket may <em>morph</em> into a different bucket + type, and may also insert a new bucket into the bucket brigade. + This must happen for buckets which represent data not mapped into + memory.</p> + + <p>To give an example; consider a bucket brigade containing a + single <code>FILE</code> bucket representing an entire file, 24 + kilobytes in size:</p> + + <example><pre>FILE(0K-24K)</pre></example> + + <p>When this bucket is read, it will read a block of data from the + file, morph into a <code>HEAP</code> bucket to represent that + data, and return the data to the caller. It also inserts a new + <code>FILE</code> bucket representing the remainder of the file; + after the <code>apr_bucket_read</code> call, the brigade looks + like:</p> + + <example><pre>HEAP(8K) FILE(8K-24K)</pre></example> + + </section> + + <section id="filtering"> + <title>Filtering brigades</title> + + <p>The basic function of any output filter will be to iterate + through the passed-in brigade and transform (or simply examine) + the content in some manner. The implementation of the iteration + loop is critical to producing a well-behaved output filter.</p> + + <p>Taking an example which loops through the entire brigade as + follows: + + <example><title>Bad output filter -- do not imitate!</title> + <pre>apr_bucket *e = APR_BRIGADE_FIRST(bb); +const char *data; +apr_size_t len; + +while (e != APR_BRIGADE_SENTINEL(bb)) { + apr_bucket_read(e, &data, &length, APR_BLOCK_READ); + e = APR_BUCKET_NEXT(e); +} + +return ap_pass_brigade(bb);</pre></example> + + The above implementation would consume memory proportional to + content size. If passed a <code>FILE</code> bucket, for example, + the entire file contents would be read into memory as each + <code>apr_bucket_read</code> call morphed a <code>FILE</code> + bucket into a <code>HEAP</code> bucket.</p> + + <p>In contrast, the implementation below will use consume a fixed + amount of memory to filter any brigade; a temporary brigade is + needed and must be allocated only once per response, see the <a + href="#state">Maintaining state</a> section.</p> + + <example><title>Better output filter</title> + + <pre>apr_bucket *e; +const char *data; +apr_size_t len; + +while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) { + rv = apr_bucket_read(e, &data, &length, APR_BLOCK_READ); + if (rv) ...; + /* Remove bucket e from bb. */ + APR_BUCKET_REMOVE(e); + /* Insert it into temporary brigade. */ + APR_BRIGADE_INSERT_HEAD(tmpbb); + /* Pass brigade upstream. */ + rv = ap_pass_brigade(f->next, tmpbb); + if (rv) ...; + apr_brigade_cleanup(tmpbb); +}</pre></example> + + </section> + + <section id="state"> + + <title>Maintaining state</title> + + <p>A filter which needs to maintain state over multiple + invocations per response can use the <code>->ctx</code> field of + its <code>ap_filter_t</code> structure. It is typical to store a + temporary brigade in such a structure, to avoid having to allocate + a new brigade per invocation as described in the <a + href="#brigade">Brigade structure</a> section.</p> + + <example><title>Example code to maintain filter state</title> + + <pre>struct dummy_state { + apr_bucket_brigade *tmpbb; + int filter_state; + .... +}; + +apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb) +{ + struct dummy_state *state; + + state = f->ctx; + if (state == NULL) { + /* First invocation for this response: initialise state structure. */ + f->ctx = state = apr_palloc(sizeof *state, f->r->pool); + + state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc); + state->filter_state = ...; + } + ...</pre></example> + + </section> + + <section id="buffer"> + <title>Buffering buckets</title> + + <p>If a filter decides to store buckets beyond the duration of a + single filter function invocation (for example storing them in its + <code>->ctx</code> state structure), those buckets must be <em>set + aside</em>. This is necessary because some bucket types provide + buckets which represent temporary resources (such as stack memory) + which will fall out of scope as soon as the filter chain completes + processing the brigade.</p> + + <p>To setaside a bucket, the <code>apr_bucket_setaside</code> + function can be called. Not all bucket types can be setaside, but + if successful, the bucket will have morphed to ensure it has a + lifetime at least as long as the pool given as an argument to the + <code>apr_bucket_setaside</code> function.</p> + + <p>Alternatively, the <code>ap_save_brigade</code> function can be + used, which will create a new brigade containing buckets with a + lifetime as long as the given pool argument. This function must + be used with great care, however: on return it guarantees that all + the buckets in the returned brigade will represent data mapped + into memory. If given an input brigade containing, for example, a + PIPE bucket, <code>ap_save_brigade</code> will consume an + arbitrary amount of memory to store the entire output of the + pipe.</p> + + <note type="warning">Filters must ensure that any buffered data is + processed and passed up the filter chain during the last + invocation for a given response (a brigade containing an EOS + bucket). Otherwise such data will be lost.</note> + + </section> + + <section id="nonblock"> + <title>Non-blocking bucket reads</title> + + <p>The <code>apr_bucket_read</code> function takes an + <code>apr_read_type_e</code> argument which determines whether a + <em>blocking</em> or <em>non-blocking</em> read will be performed + from the data source. A good filter will first attempt to read + from every data bucket using a non-blocking read; if that fails + with <code>APR_EAGAIN</code>, then send a <code>FLUSH</code> + bucket up the filter chain, and retry using a blocking read.</p> + + <p>This mode of operation ensure that any filters further up the + filter chain will flush any buffered buckets if a slow content + source is being used.</p> + + <p>A CGI script is an example of a slow content source which is + implemented as a bucket type. <module>mod_cgi</module> will send + <code>PIPE</code> buckets which represent the output from a CGI + script; reading from such a bucket will block when waiting for the + CGI script to produce more output.</p> + + <example> + <title>Example code using non-blocking bucket reads</title> + +<pre>apr_bucket *e; +apr_read_type_e mode = APR_NONBLOCK_READ; + +while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) { + apr_status_t rv; + + rv = apr_bucket_read(e, &data, &length, mode); + if (rv == APR_EAGAIN && mode == APR_NONBLOCK_READ) { + /* Pass up a brigade containing a flush bucket: */ + APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...)); + rv = ap_pass_brigade(f->next, tmpbb); + apr_brigade_cleanup(tmpbb); + if (rv != APR_SUCCESS) return rv; + + /* Retry, using a blocking read. */ + mode = APR_BLOCK_READ; + continue; + } else if (rv != APR_SUCCESS) { + /* handle errors */ + } + + /* Next time, try a non-blocking read first. */ + mode = APR_NONBLOCK_READ; + ... +}</pre></example> + + </section> + + <section id="rules"> + <title>Ten rules for output filters</title> + + <p>In summary, here is a set of rules for all output filters to + follow:</p> + + <ol> + <li>Output filters should not pass empty brigades up the filter + chain, but should be tolerant of being passed empty + brigades.</li> + + <li>Output filters must pass all metadata buckets up the filter + chain; <code>FLUSH</code> buckets should be respected by passing + any pending or buffered buckets up the filter chain.</li> + + <li>Output filters should ignore any buckets following an + <code>EOS</code> bucket.</li> + + <li>Output filters which read all the buckets in a brigade must + process a fixed number of buckets (or amount of data) at a time, + to ensure that memory consumption is not proportional to the + size of the content being filtered.</li> + + <li>Output filters should be agnostic with respect to bucket + types, and must be able to process buckets of unfamiliar + type.</li> + + <li>After calling <code>ap_pass_brigade</code> to pass a brigade + up the filter chain, output filters should call + <code>apr_brigade_clear</code> to ensure the brigade is empty + before reusing that brigade structure; output filters should + never use <code>apr_brigade_destroy</code> to "destroy" + brigades.</li> + + <li>Output filters must <em>setaside</em> any buckets which are + preserved beyond the duration of the filter function.</li> + + <li>Output filters must not ignore the return value of + <code>ap_pass_brigade</code>, and must return appropriate errors + back down the filter chain.</li> + + <li>Output filters must only create a fixed number of bucket + brigades for each response, rather than one per invocation.</li> + + <li>Output filters should first attempt non-blocking reads from + each data bucket, and send a <code>FLUSH</code> bucket up the + filter chain if the read blocks, before retrying with a blocking + read.</li> + + </ol> + + </section> + +</manualpage> Property changes on: docs/manual/developer/output-filters.xml ___________________________________________________________________ Name: svn:eol-style + native Index: modules/experimental/config.m4 =================================================================== --- modules/experimental/config.m4 (revision 519147) +++ modules/experimental/config.m4 (working copy) @@ -4,5 +4,6 @@ APACHE_MODULE(example, example and demo module, , , no) APACHE_MODULE(case_filter, example uppercase conversion filter, , , no) APACHE_MODULE(case_filter_in, example uppercase conversion input filter, , , no) +APACHE_MODULE(buffer_filter, example output filter which buffers buckets, , , no) APACHE_MODPATH_FINISH Index: modules/experimental/mod_buffer_filter.c =================================================================== --- modules/experimental/mod_buffer_filter.c (revision 0) +++ modules/experimental/mod_buffer_filter.c (revision 0) @@ -0,0 +1,124 @@ +/* Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "httpd.h" +#include "http_config.h" +#include "apr_buckets.h" +#include "apr_general.h" +#include "apr_lib.h" +#include "util_filter.h" +#include "http_request.h" +#include "http_log.h" + +struct buffer_filter_state { + apr_bucket_brigade *tmpbb; + apr_size_t tmplen; +}; + +#define MAX_BUFFER_BYTES (8000) + +static int buffer_filter(ap_filter_t *f, apr_bucket_brigade *bb) +{ + struct buffer_filter_state *state; + apr_read_type_e mode = APR_NONBLOCK_READ; + apr_bucket *e; + + state = f->ctx; + if (state == NULL) { + /* First invocation for this response: initialise state structure. */ + f->ctx = state = apr_palloc(f->r->pool, sizeof *state); + + state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc); + state->tmplen = 0; + } + + /* Process passed-in brigade. */ + while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) { + apr_size_t length; + const char *data; + apr_status_t rv; + + if (!APR_BUCKET_IS_METADATA(e)) { + rv = apr_bucket_read(e, &data, &length, mode); + if (APR_STATUS_IS_EAGAIN(rv) && mode == APR_NONBLOCK_READ) { + /* Pass up a brigade containing a flush bucket: */ + APR_BRIGADE_INSERT_TAIL(state->tmpbb, + apr_bucket_flush_create(f->c->bucket_alloc)); + + rv = ap_pass_brigade(f->next, state->tmpbb); + apr_brigade_cleanup(state->tmpbb); + state->tmplen = 0; + if (rv != APR_SUCCESS) { + return rv; + } + + /* Retry, using a blocking read. */ + mode = APR_BLOCK_READ; + continue; + } + else if (rv != APR_SUCCESS) { + ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, f->r, + "could not read from bucket"); + return APR_EGENERAL; + } + + /* Next time, try a non-blocking read first. */ + mode = APR_NONBLOCK_READ; + + state->tmplen += length; + } + + APR_BUCKET_REMOVE(e); + APR_BRIGADE_INSERT_TAIL(state->tmpbb, e); + + if (APR_BUCKET_IS_FLUSH(e) || APR_BUCKET_IS_EOS(e) + || state->tmplen >= MAX_BUFFER_BYTES) { + rv = ap_pass_brigade(f->next, state->tmpbb); + apr_brigade_cleanup(state->tmpbb); + state->tmplen = 0; + + if (rv) { + return rv; + } + } + else { + rv = apr_bucket_setaside(e, f->r->pool); + if (rv) { + ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, f->r, + "could not setaside bucket"); + return APR_EGENERAL; + } + } + } + + return APR_SUCCESS; +} + +static void register_hooks(apr_pool_t *p) +{ + ap_register_output_filter("BUFFER", buffer_filter, NULL, AP_FTYPE_RESOURCE); +} + +module AP_MODULE_DECLARE_DATA buffer_module = +{ + STANDARD20_MODULE_STUFF, + NULL, /* dir config creater */ + NULL, /* dir merger --- default is to override */ + NULL, /* server config */ + NULL, /* merge server config */ + NULL, /* command apr_table_t */ + register_hooks /* register hooks */ +}; Property changes on: modules/experimental/mod_buffer_filter.c ___________________________________________________________________ Name: svn:eol-style + native