On 02/10/2008 05:39 PM, Dirk-Willem van Gulik wrote:
See below for some aloud thinking -- note that this is not yet well
thought out (all I did was try to minimize the overlap between
disk_cache, memcached, distcache and some commercial thing -- and tried
to move as much RFC2616 knowledge into mod_cache.c*).
- internal ap_cache_cacheable_hdr to hold all the RFC 2616
non cachable headers info as before.
- ap_cache_cacheable_hdrs_out introduced as per earlier discussion
which knows about error headers and mandatory content setting.
- ap_cache_cacheable_hdrs_in introduced - which has an optional
argument to further curtail the headers returned by just those
in the 'Vary'.
- ap_normalize_vary_to_array() introduced. Which can help
caching storage modules to create a key across the Vary
relevant fields, if any.
=> given that they all have to create this -- wondering if
we should make this part of the cache_info struct already.
Maybe not a bad idea.
- Likewise a ap_generate_vary_key() which understands
vary if/when needed.
Below more or less cleans mod_disk_cache completly of having to
understand http (or any protocol aspect) - but for
ap_generate_vary_key() -- but it is not yet HTTP neutral (i.e. a IMAP
protocol module would have to overwrite things it has not yet XS to).
Thoughts,
Dw.
Some comments:
1. Some style nitpicks (especially indenting, sometimes typos in the comments)
that
make the patch sometimes hard to read.
2. Don't forget to add a minor bump once you commit.
3. While it may be a worthwhile goal to improve the error handling of
mod_disk_cache
I would love to see this addressed in a separate patch as these pieces of
code distract
while reading the patch and are not strictly connected to your goal of
abstracting
logic out of mod_disk_cache.
4. Looks good in general, some more questions inline
*: not yet nearly enough - i.e. lacking the whole case (in)sensitive,
date, etc stuff.
Index: cache_util.c
===================================================================
--- cache_util.c (revision 620288)
+++ cache_util.c (working copy)
@@ -571,10 +573,10 @@
return apr_pstrdup(p, hashfile);
}
-/* Create a new table consisting of those elements from an input
+/* Create a new table consisting of those elements from an
* headers table that are allowed to be stored in a cache.
*/
-CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(apr_pool_t *pool,
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs(apr_pool_t *pool,
apr_table_t *t,
server_rec *s)
{
@@ -582,12 +584,17 @@
char **header;
int i;
+ if (t == NULL) {
+ return NULL;
+ };
+
/* Make a copy of the headers, and remove from
* the copy any hop-by-hop headers, as defined in Section
* 13.5.1 of RFC 2616
*/
apr_table_t *headers_out;
headers_out = apr_table_copy(pool, t);
+
apr_table_unset(headers_out, "Connection");
apr_table_unset(headers_out, "Keep-Alive");
apr_table_unset(headers_out, "Proxy-Authenticate");
@@ -599,6 +606,7 @@
conf = (cache_server_conf *)ap_get_module_config(s->module_config,
&cache_module);
+
/* Remove the user defined headers set with CacheIgnoreHeaders.
* This may break RFC 2616 compliance on behalf of the administrator.
*/
@@ -608,3 +616,170 @@
}
return headers_out;
}
+
+/* Create a new table consisting of those elements from an input
+ * headers table that are allowed to be stored in a cache. Optionally
+ * filtered to just the fields passed in the vary_filter array.
Why this? Do you want to save space by not storing the input headers or
only storing those relevant to varying?
From a first glance this looks like to be a good idea, but it may be better
separated to another patch to ease reading. This could possibly used later
to ease the logic in cache_select in cache_storage.c, as today its vary logic
is unneeded IMHO in the mod_disk_cache case. IMHO I cannot open the
mod_disk_cache
entity if the cached resource had vary headers and the cached headers do not
match the ones from the incoming request. I would not have a hdrs file in this
case since the path to it is based on the hashed values of the input headers
that vary.
+ *
+ * @return a table with the valid input headers or NULL if none are
relevant.
+ */
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_in(request_rec * r,
apr_array_header_t * vary_filter)
+{
+ apr_table_t *headers_in, * vary_filtered;
+ int i;
+
+ if (r->headers_in == NULL)
+ return NULL;
+
+ headers_in = ap_cache_cacheable_hdrs(r->pool, r->headers_in,
r->server);
+
+ if (!vary_filter)
+ return apr_is_empty_table(headers_in) ? NULL: headers_in;
+
+ /* The vary array is most propably the smallest of the two - su
+ * only copy just those fields across.
+ */
+ vary_filtered = apr_table_make(r->pool, vary_filter->nelts);
+ for (i = 0; i < vary_filter->nelts; i++) {
+ const char *key = ((const char**)vary_filter->elts)[i];
+ const char * val = apr_table_get(headers_in, key); /*
XXX: Not case insensitive */
+ if (val)
+ apr_table_add(vary_filtered, key, val);
+ };
+
+ return apr_is_empty_table(vary_filtered) ? NULL : vary_filtered;
+}
+
+/* Create a new table consisting of those elements from an output
+ * headers table that are allowed to be stored in a cache;
+ * ensure there is a content type and capture any errors.
+ */
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(request_rec * r)
+{
+ apr_table_t *headers_out;
+
+ headers_out = ap_cache_cacheable_hdrs(r->pool, r->headers_out,
+ r->server);
+
+ if (!apr_table_get(headers_out, "Content-Type")
+ && r->content_type) {
+ apr_table_setn(headers_out, "Content-Type",
+ ap_make_content_type(r, r->content_type));
+ }
+
+ headers_out = apr_table_overlay(r->pool, headers_out,
+ r->err_headers_out);
+
+ return headers_out;
+}
+
+static int array_alphasort(const void *fn1, const void *fn2)
+{
+ return strcmp(*(char**)fn1, *(char**)fn2);
+}
+
+static void vary_tokens_to_array(apr_pool_t *p, const char *data,
+ apr_array_header_t *arr)
+{
+ char *token;
+
+ while ((token = ap_get_list_item(p, &data)) != NULL) {
+ *((const char **) apr_array_push(arr)) = token;
+ }
+
+ qsort((void *) arr->elts, arr->nelts,
+ sizeof(char *), array_alphasort);
+}
+
+/* Given a header table; extract any Vary present and return
+ * the values in a predictable way (i.e. Sort it so that
+ * "Vary: A, B" and "Vary: B, A" are stored the same; and take
+ * care of any case sensitive issues. Onb exit the array will
+ * either be NULL (no VARYance) or a sorted array.
+ *
+ * @return vary-array or NULL, if no variance
+ */
+CACHE_DECLARE(apr_array_header_t*)
ap_normalize_vary_to_array(apr_pool_t *p, apr_table_t * headers)
+{
+ apr_array_header_t* varray = NULL;
+ const char *vary;
+
+ if ((headers) && ((vary= apr_table_get(headers, "Vary")) != NULL))
+ {
+ varray = apr_array_make(p, 6, sizeof(char*));
+ vary_tokens_to_array(p, vary, varray);
+ }
+
+ return varray;
+}
+
+/* Generate a new key based on the normal key (URI) and normalized
values from
+ * the Vary array. The vary array is assumed to be in a stable order
and format.
+ */
+CACHE_DECLARE(const char*) ap_generate_vary_key(apr_pool_t *p,
apr_table_t * input_headers,
+ apr_array_header_t *varray, const char
*oldkey)
+{
+ struct iovec *iov;
+ int i, k;
+ int nvec;
+ const char *header;
+ const char **elts;
+
+ if (varray == NULL);
+ return oldkey;
+
+ nvec = (varray->nelts * 2) + 1;
+ iov = apr_palloc(p, sizeof(struct iovec) * nvec);
+ elts = (const char **) varray->elts;
+
+ /* TODO:
+ * - Handle multiple-value headers better. (sort them?)
+ * - Handle Case in-sensitive Values better.
+ * This isn't the end of the world, since it just lowers the
cache
+ * hit rate, but it would be nice to fix.
+ *
+ * The majority are case insenstive if they are values (encoding etc).
+ * Most of rfc2616 is case insensitive on header contents.
+ *
+ * So the better solution may be to identify headers which should be
+ * treated case-sensitive?
+ * HTTP URI's (3.2.3) [host and scheme are insensitive]
+ * HTTP method (5.1.1)
+ * HTTP-date values (3.3.1)
+ * 3.7 Media Types [exerpt]
+ * The type, subtype, and parameter attribute names are case-
+ * insensitive. Parameter values might or might not be
case-sensitive,
+ * depending on the semantics of the parameter name.
+ * 4.20 Except [exerpt]
+ * Comparison of expectation values is case-insensitive for
unquoted
+ * tokens (including the 100-continue token), and is
case-sensitive for
+ * quoted-string expectation-extensions.
+ *
+ * XX we are currently concatenating the values. With some
puzzling one could
+ * set a header value which looks like the 'next/previous'
value - and hence
+ * cause a value which is not the 'real' one. This may be a
security risk.
+ * Ideally we should use a special value which cannot occur in
a header
+ * legally (or is escaped/wacked). Given that (at least) a
Header cannot
+ * contain a space or ':' - may be an idea to insert those.
+ */
I currently do not understand your worries here. Could you please explain this
in more detail?
- /* Parse the vary header and dump those fields from the headers_in. */
- /* FIXME: Make call to the same thing cache_select calls to crack
Vary. */
- if (r->headers_in) {
+ /* Parse the vary header and dump those fields from the headers_in.
+ * But only do so if needed - i.e. there is a Vary - and then limit
+ * it to the headers needed to validate the Vary.
+ */
See my comment above regarding vary filtering.
+ if (r->headers_in && varray) {
apr_table_t *headers_in;
-
- headers_in = ap_cache_cacheable_hdrs_out(r->pool, r->headers_in,
- r->server);
+ headers_in = ap_cache_cacheable_hdrs_in(r, varray);
+ if (headers_in) {
rv = store_table(dobj->hfd, headers_in);
if (rv != APR_SUCCESS) {
+ apr_file_close(dobj->hfd); /* flush and close */
+ apr_file_remove(dobj->tempfile, r->pool);
return rv;
}
}
+ }
Regards
Rüdiger