On 02/10/2008 05:39 PM, Dirk-Willem van Gulik wrote:



See below for some aloud thinking -- note that this is not yet well thought out (all I did was try to minimize the overlap between disk_cache, memcached, distcache and some commercial thing -- and tried to move as much RFC2616 knowledge into mod_cache.c*).

-    internal ap_cache_cacheable_hdr to hold all the RFC 2616
        non cachable headers info as before.

-    ap_cache_cacheable_hdrs_out introduced as per earlier discussion
    which knows about error headers and mandatory content setting.

-    ap_cache_cacheable_hdrs_in introduced - which has an optional
    argument to further curtail the headers returned by just those
    in the 'Vary'.

-    ap_normalize_vary_to_array() introduced. Which can help
    caching storage modules to create a key across the Vary
    relevant fields, if any.

    => given that they all have to create this -- wondering if
    we should make this part of the cache_info struct already.

Maybe not a bad idea.


-    Likewise a ap_generate_vary_key() which understands
    vary if/when needed.

Below more or less cleans mod_disk_cache completly of having to understand http (or any protocol aspect) - but for ap_generate_vary_key() -- but it is not yet HTTP neutral (i.e. a IMAP protocol module would have to overwrite things it has not yet XS to).

Thoughts,

Dw.


Some comments:

1. Some style nitpicks (especially indenting, sometimes typos in the comments) 
that
   make the patch sometimes hard to read.
2. Don't forget to add a minor bump once you commit.
3. While it may be a worthwhile goal to improve the error handling of 
mod_disk_cache
   I would love to see this addressed in a separate patch as these pieces of 
code distract
   while reading the patch and are not strictly connected to your goal of 
abstracting
   logic out of mod_disk_cache.
4. Looks good in general, some more questions inline


*: not yet nearly enough - i.e. lacking the whole case (in)sensitive, date, etc stuff.

Index: cache_util.c
===================================================================
--- cache_util.c    (revision 620288)
+++ cache_util.c    (working copy)
@@ -571,10 +573,10 @@
     return apr_pstrdup(p, hashfile);
 }

-/* Create a new table consisting of those elements from an input
+/* Create a new table consisting of those elements from an
  * headers table that are allowed to be stored in a cache.
  */
-CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(apr_pool_t *pool,
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs(apr_pool_t *pool,
                                                         apr_table_t *t,
                                                         server_rec *s)
 {
@@ -582,12 +584,17 @@
     char **header;
     int i;

+    if (t == NULL) {
+    return NULL;
+    };
+
     /* Make a copy of the headers, and remove from
      * the copy any hop-by-hop headers, as defined in Section
      * 13.5.1 of RFC 2616
      */
     apr_table_t *headers_out;
     headers_out = apr_table_copy(pool, t);
+
     apr_table_unset(headers_out, "Connection");
     apr_table_unset(headers_out, "Keep-Alive");
     apr_table_unset(headers_out, "Proxy-Authenticate");
@@ -599,6 +606,7 @@

     conf = (cache_server_conf *)ap_get_module_config(s->module_config,
                                                      &cache_module);
+
     /* Remove the user defined headers set with CacheIgnoreHeaders.
      * This may break RFC 2616 compliance on behalf of the administrator.
      */
@@ -608,3 +616,170 @@
     }
     return headers_out;
 }
+
+/* Create a new table consisting of those elements from an input
+ * headers table that are allowed to be stored in a cache. Optionally
+ * filtered to just the fields passed in the vary_filter array.

Why this? Do you want to save space by not storing the input headers or
only storing those relevant to varying?
From a first glance this looks like to be a good idea, but it may be better
separated to another patch to ease reading. This could possibly used later
to ease the logic in cache_select in cache_storage.c, as today its vary logic
is unneeded IMHO in the mod_disk_cache case. IMHO I cannot open the 
mod_disk_cache
entity if the cached resource had vary headers and the cached headers do not
match the ones from the incoming request. I would not have a hdrs file in this
case since the path to it is based on the hashed values of the input headers
that vary.

+ *
+ * @return a table with the valid input headers or NULL if none are relevant.
+ */
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_in(request_rec * r, apr_array_header_t * vary_filter)
+{
+    apr_table_t *headers_in, * vary_filtered;
+        int i;
+
+    if (r->headers_in == NULL)
+        return NULL;
+
+ headers_in = ap_cache_cacheable_hdrs(r->pool, r->headers_in, r->server); + + if (!vary_filter)
+        return apr_is_empty_table(headers_in) ? NULL: headers_in;
+
+        /* The vary array is most propably the smallest of the two - su
+     * only copy just those fields across.
+     */
+         vary_filtered = apr_table_make(r->pool, vary_filter->nelts);
+        for (i = 0; i < vary_filter->nelts; i++) {
+                const char *key = ((const char**)vary_filter->elts)[i];
+ const char * val = apr_table_get(headers_in, key); /* XXX: Not case insensitive */
+                if (val)
+                        apr_table_add(vary_filtered, key, val);
+        };
+
+    return apr_is_empty_table(vary_filtered) ? NULL : vary_filtered;
+}
+
+/* Create a new table consisting of those elements from an output
+ * headers table that are allowed to be stored in a cache;
+ * ensure there is a content type and capture any errors.
+ */
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_hdrs_out(request_rec * r)
+{
+    apr_table_t *headers_out;
+
+    headers_out = ap_cache_cacheable_hdrs(r->pool, r->headers_out,
+                                                  r->server);
+
+        if (!apr_table_get(headers_out, "Content-Type")
+            && r->content_type) {
+            apr_table_setn(headers_out, "Content-Type",
+                           ap_make_content_type(r, r->content_type));
+        }
+
+        headers_out = apr_table_overlay(r->pool, headers_out,
+                                        r->err_headers_out);
+
+    return headers_out;
+}
+
+static int array_alphasort(const void *fn1, const void *fn2)
+{
+    return strcmp(*(char**)fn1, *(char**)fn2);
+}
+
+static void vary_tokens_to_array(apr_pool_t *p, const char *data,
+                            apr_array_header_t *arr)
+{
+    char *token;
+
+    while ((token = ap_get_list_item(p, &data)) != NULL) {
+        *((const char **) apr_array_push(arr)) = token;
+    }
+
+    qsort((void *) arr->elts, arr->nelts,
+         sizeof(char *), array_alphasort);
+}
+
+/* Given a header table; extract any Vary present and return
+ * the values in a predictable way (i.e. Sort it so that
+ * "Vary: A, B" and "Vary: B, A" are stored the same; and take
+ * care of any case sensitive issues. Onb exit the array will
+ * either be NULL (no VARYance) or a sorted array.
+ *
+ * @return vary-array or NULL, if no variance
+ */
+CACHE_DECLARE(apr_array_header_t*) ap_normalize_vary_to_array(apr_pool_t *p, apr_table_t * headers)
+{
+    apr_array_header_t* varray = NULL;
+    const char *vary;
+ + if ((headers) && ((vary= apr_table_get(headers, "Vary")) != NULL))
+    {
+        varray = apr_array_make(p, 6, sizeof(char*));
+            vary_tokens_to_array(p, vary, varray);
+    }
+ + return varray;
+}
+
+/* Generate a new key based on the normal key (URI) and normalized values from + * the Vary array. The vary array is assumed to be in a stable order and format.
+ */
+CACHE_DECLARE(const char*) ap_generate_vary_key(apr_pool_t *p, apr_table_t * input_headers, + apr_array_header_t *varray, const char *oldkey)
+{
+    struct iovec *iov;
+    int i, k;
+    int nvec;
+    const char *header;
+    const char **elts;
+
+    if (varray == NULL);
+    return oldkey;
+
+    nvec = (varray->nelts * 2) + 1;
+    iov = apr_palloc(p, sizeof(struct iovec) * nvec);
+    elts = (const char **) varray->elts;
+
+    /* TODO:
+     *    - Handle multiple-value headers better. (sort them?)
+     *    - Handle Case in-sensitive Values better.
+ * This isn't the end of the world, since it just lowers the cache
+     *        hit rate, but it would be nice to fix.
+     *
+     * The majority are case insenstive if they are values (encoding etc).
+     * Most of rfc2616 is case insensitive on header contents.
+     *
+     * So the better solution may be to identify headers which should be
+     * treated case-sensitive?
+     *  HTTP URI's (3.2.3) [host and scheme are insensitive]
+     *  HTTP method (5.1.1)
+     *  HTTP-date values (3.3.1)
+     *  3.7 Media Types [exerpt]
+     *     The type, subtype, and parameter attribute names are case-
+ * insensitive. Parameter values might or might not be case-sensitive,
+     *     depending on the semantics of the parameter name.
+     *  4.20 Except [exerpt]
+ * Comparison of expectation values is case-insensitive for unquoted + * tokens (including the 100-continue token), and is case-sensitive for
+     *     quoted-string expectation-extensions.
+     *
+ * XX we are currently concatenating the values. With some puzzling one could + * set a header value which looks like the 'next/previous' value - and hence + * cause a value which is not the 'real' one. This may be a security risk. + * Ideally we should use a special value which cannot occur in a header + * legally (or is escaped/wacked). Given that (at least) a Header cannot
+     *     contain a space or ':' - may be an idea to insert those.
+     */

I currently do not understand your worries here. Could you please explain this
in more detail?



-    /* Parse the vary header and dump those fields from the headers_in. */
- /* FIXME: Make call to the same thing cache_select calls to crack Vary. */
-    if (r->headers_in) {
+    /* Parse the vary header and dump those fields from the headers_in.
+     * But only do so if needed - i.e. there is a Vary - and then limit
+     * it to the headers needed to validate the Vary.
+     */

See my comment above regarding vary filtering.

+    if (r->headers_in && varray) {
         apr_table_t *headers_in;
-
-        headers_in = ap_cache_cacheable_hdrs_out(r->pool, r->headers_in,
-                                                 r->server);
+        headers_in = ap_cache_cacheable_hdrs_in(r, varray);
+        if (headers_in) {
         rv = store_table(dobj->hfd, headers_in);
         if (rv != APR_SUCCESS) {
+                 apr_file_close(dobj->hfd); /* flush and close */
+                 apr_file_remove(dobj->tempfile, r->pool);
             return rv;
         }
     }
+    }

Regards

Rüdiger

Reply via email to