[PATCH] mod_negotiation, suffix order

Francis Daly Mon, 30 Apr 2001 10:39:53 -0700


Hi there,

this is essentially a repost of some mails earlier this month with the
same patch and a similar Subject:.

Appended to this mail is a patch to remove the requirements on the
order of suffixes when using MultiViews / mod_negotiation.  It does
have the down side of increasing the number of valid URLs for the same
content, but to a limited extent that is implicit in mod_negotiation
anyway.

The patch is relative to the version of mod_negotiation.c distributed
with apache-2.0.16.  There is a newer version in CVS, but the patch
should still apply cleanly.

But first, some notes:

The current method takes the "file" part of r->filename (either the
bit after the final / in the URI, or the value of DirectoryIndex).
First, if the exact filename matches, mod_negotiation declines to
handle it.  Second, for each file in the directory, it tries to match
/^file\./.

This patched method does an extra strchr(), and uses a few extra
int's and char *'s; and then for the requested file "file" does the
same thing.

However, if the r->filename is actually "file.s1.s2.sZ" (with dots),
the current way looks for /^file\.s1\.s2\.sZ\./; the patched way looks
for each of /^file\./, /\.s1/, /\.s2/, /\.sZ/.  It bails out at the
first failure.

Extra pointer and string manipulation is needed to do this, per dot in
the requested file name, per file in the directory.

Some consequences of this implementation are:

Current method: file "name.html.en" is only accessible through
(partial) URIs "name", "name.html", or "name.html.en"

Patched method: The same three work, as do "name.en" and
"name.en.html".  That is good.  However: so do "name.htm",
"name.htm.en", and "name.en.htm".  That may be considered good.  More
however: so do "name.h", "name.h.h", "name...h.e.e..e.h.h.", and an
infinite number of similar variations.  That may not be considered
good.

In fact, the infinite number of possibilities is limited by the
requirement that the length of the file name must be at least the
length of the request in order to be considered, so a request with a
dozen trailing dots will only have the hit of many strstr()s for files
that match the prefix and have long enough names.

In each case, the content is returned with a Content-Location: header
indicating the canonical filename.

The requirements are (1)r->filename up to the first dot must match the
real filename up to the first dot; (2)r->filename may not be longer
than the real filename; (3)each .suffix in r->filename must exist
(string match) in the real filename; (4)the real filename must
correspond to a known mime-type, encoding, etc -- which I think means
that the final suffix must be known, and only suffixes followed by
known suffixes are considered.

As a real example, testing with the apache "It worked!" page (named
index.html.LANG), if I request index.html.fr, I get the page back.  If
I request index.fr.html, or just index.fr, I get back the 406 Not
Acceptable page, with a link to index.html.fr, _unless_ I include fr
as an acceptable language.  If I include fr as a language, I can
request /index.fr, /index.fr.html, or /index.html.fr successfully.  If
I include fr as my preferred language, I can additionally request /,
/index, and /index.html.  (As well as the .h, .ht, .htm, .f variants
referred to earlier).  If I request /index.d, I get a 406 with links
to index.html.de and index.html.dk

As a faked example, consider five files in the DocumentRoot, with no
special customisations to the (MIME) configuration:

files a.b.c, d.e.html, g.h.i.j.k.en, m.n.o.p.q.html, s.t.html.u.v

The following requests have the indicated results:

GET /a            -> not found
GET /a.b          -> not found
GET /a.c          -> not found
GET /a.b.c        -> success
GET /d            -> success
GET /d.e          -> success
GET /d.h          -> success
GET /d.html       -> success
GET /d....html    -> not found
GET /g            -> not found
GET /g.h          -> not found
GET /g.h.i.j.k    -> not found
GET /g.h.i.j.k.en -> success
GET /g.h.i.k.j.en -> not found
GET /m            -> success
GET /m.html       -> success
GET /m.o.q.p.n    -> success
GET /m.o.r.p.n    -> not found
GET /s.t.html.u.v -> success
GET /s            -> not found
GET /s.t.html.u   -> not found

note that in the "not found" cases there (except for /m.o.r.p.n and
/d....html), the patched code does pass the file down as being
potentially valid -- it's later code which decides that it doesn't
know how to treat the final suffix, and fails it.

As another faked example, with files ..d.f.html and .e.txt, I can
successfully issue GETs for /.d, /.f, /.h, /.e and /.t, as well as
things like /....t. (whether or not the final . there is punctuation). 

So that's it.  If I've missed something obvious, like r->filename being
read-only or something, I'll head back to the drawing board.

All the best,

        f
-- 
Francis Daly        [EMAIL PROTECTED]

--- modules/mappers/mod_negotiation.c.virgin    Sun Feb 18 02:58:52 2001
+++ modules/mappers/mod_negotiation.c   Mon Apr 30 19:06:03 2001
@@ -911,6 +911,10 @@
     struct var_rec mime_info;
     struct accept_rec accept_info;
     void *new_var;
+    char *pos;
+    int pos_len;
+    int not_this_dirent;        /* actually, boolean. */
+    int dots_in_request = 0;    /* 1 == one dot, 2 == some dots */
 
     clean_var_rec(&mime_info);
 
@@ -931,15 +935,81 @@
         return HTTP_FORBIDDEN;
     }
 
+    if ((pos = strchr(filp, '.'))) {
+        dots_in_request = 1;
+        if (strchr(++pos, '.')) {
+            dots_in_request = 2;
+        }
+    }
+
     while (apr_dir_read(&dirent, APR_FINFO_DIRENT, dirp) == APR_SUCCESS) {
         request_rec *sub_req;
         
         /* Do we have a match? */
-        if (strncmp(dirent.name, filp, prefix_len)) {
-            continue;
-        }
-        if (dirent.name[prefix_len] != '.') {
-            continue;
+
+        if (!dots_in_request) {
+
+            /* Given "name", check for "name." */
+            if (strncmp(dirent.name, filp, prefix_len)) {
+                continue;
+            }
+            if (dirent.name[prefix_len] != '.') {
+                continue;
+            }
+
+        } else {
+
+            /* Given "name.suffixes", check for "name." */
+            pos = strchr(filp, '.');
+            pos_len = pos - filp + 1;
+            if (strncmp(dirent.name, filp, pos_len)) {
+                continue;
+            }
+
+            not_this_dirent = 0;
+            filp = ++pos;
+
+            /* Given "name.suf1.suf2.suffix", check for each ".sufN" */
+            if (2 == dots_in_request) {
+                /* Give up now if the request is longer than the file */
+                if (prefix_len > strlen(dirent.name)) {
+                    filp -= pos_len;
+                    continue;
+                }
+
+                while ((pos = strchr(filp, '.'))) {
+                    --filp;
+                    pos_len = pos - filp ;
+
+                    filp[pos_len]='\0';
+                    if (!strstr(dirent.name, filp)) {
+                        not_this_dirent=1;
+                    }
+                    filp[pos_len] = '.';
+
+                    filp += pos_len + 1;
+                    
+                    if (not_this_dirent) {
+                        /* get to next dirent */
+                        break;
+                    }
+                }
+                if (not_this_dirent) {
+                    /* reset filp before trying next dirent */
+                    pos_len = strlen(filp);
+                    filp -= prefix_len - pos_len;
+                    continue;
+                }
+            }
+            --filp;
+            pos_len = strlen(filp);
+
+            /* Check for the final ".suffix" */
+            if (!strstr(dirent.name, filp)) {
+                filp -= prefix_len - pos_len;
+                continue;
+            }
+            filp -= prefix_len - pos_len;
         }
 
         /* Yep.  See if it's something which we have access to, and

[PATCH] mod_negotiation, suffix order

Reply via email to