On Wed, Oct 30, 2002 at 06:09:04PM -0600, William A. Rowe, Jr. wrote:
> At 04:43 PM 10/30/2002, Bill Stoddard wrote:
> 
> >> At 02:52 PM 10/30/2002, Roy T. Fielding wrote:
> >> >Your patch will simply let the %2F through, but then a later section
> >> >of code will translate them to / and we've opened a security hole
> >> >in the main server.  I'd rather move the rejection code to the
> >> >place where a decision has to be made (like the directory walk),
> >> >but I have no time to do it myself.  I think it is reasonable to
> >> >allow %2F under some circumstances, but only in content handlers
> >> >and only as part of path-info and not within the real directory
> >> >structure.
> >>
> >> That's the right idea... however it doesn't work.
> >>
> >> %2f is distinct from '/' - the rfc defines it as another character altogether.
> >You've lost me.
> 
> 3.2.3 URI Comparison ... 
> 
> *** Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 
>[42]) are equivalent to their ""%" HEX HEX" encoding. For example, the following 
>three URIs are equivalent: 
>   http://abc.com:80/~smith/home.html 
>   http://ABC.com/%7Esmith/home.html 
>   http://ABC.com:/%7esmith/home.html 
> ***
[...]

What about "normalizing" the URI or URL for comparison purposes at
the outset of the request?

It could overwrite the original request or could be stored separately.
It would simplify Directory and Location match routines and would define
a consistent ruleset against which URLs were matches.  (It might require
some people to modify their Directory{Match} and Location{Match} directives
a bit, but only if their paths contained $&+,:;=?@ chars.)

-Glenn

Lightly tested:

#include <stdio.h>
#include <string.h>

/* partially normalizes URIs used for comparison by decoding %XX hex encodings
 * (Full normalization of a complete URI would include normalizing the case of
 *  chars in the scheme (e.g. "HttP://"; => "http://";) and chars in the host
 *  name, also optionally adding in the default server port (for consistency) if
 *  not specified.  Multiple consecutive slashes and '.' and '..' path segments
 *  could be removed.  None of these are currently done here.)
 * Also of interest is the observation that the 'A'-'F' hex chars are case
 * insensitive (e.g. %2b is equivalent to %2B for encoding '+')  However, for
 * case-sensitive file systems (containing the %XX value in the file path), this
 * may cause a mismatch.  Likewise for comparing two otherwise equivalent URIs.
 * For comparing URIs, it might make sense to enforce uppercase (or lowercase).
 */
static int normalize_http_uri(char *uri)
{
    char *s = uri;
    char hex[3] = "\0\0";

    while (*s) {

        /* not a hex encoded sequence */
        if (*s != '%') {
            *uri++ = *s++;
            continue;
        }

        /* invalid hex encoded sequence at end of string */
        if (*(s+1) == '\0' || *(s+2) == '\0') {
            if (0) { /* FIXME: replace w/ StrictProtocol(?) directive check */
                return HTTP_BAD_REQUEST;
            }
            else {
                s++;
                *uri++ = '%';
                continue;
            }
        }

        /* RFC 2396 "reserved" chars that should not be unencoded in
         * normalization for URI Comparison (RFC 2616 section 3.2.3)
         * '$' = %24,  '&' = %26,  '+' = %2B,  ',' = %2C,  '/' = %2F
         * ':' = %3A,  ';' = %3B,  '=' = %3D,  '?' = %3F,  '@' = %40
         */
        switch (*(s+1)) {
          case '2':
            switch (*(s+2)) {
              case '4': case 'B': case 'C': case 'F':
              case '6': case 'b': case 'c': case 'f':
                s += 2;
                *uri++ = '%';
                *uri++ = '2';
                *uri++ = *s++;  /* (might instead do *uri++ = toupper(*s++); */
                continue;
            }
            break;
          case '3':
            switch (*(s+2)) {
              case 'A': case 'B': case 'D': case 'F':
              case 'a': case 'b': case 'd': case 'f':
                s += 2;
                *uri++ = '%';
                *uri++ = '3';
                *uri++ = *s++;  /* (might instead do *uri++ = toupper(*s++); */
                continue;
            }
            break;
          case '4':
            if (*(s+2) == '0') {
                s += 3;
                *uri++ = '%';
                *uri++ = '4';
                *uri++ = '0';
                continue;
            }
            break;
        }

        s++;
        hex[0] = *s++;
        hex[1] = *s++;

        /* normalize valid hex encoded sequence */
        if (strspn(hex,"0123456789ABCDEFabcdef") == 2) {
            *uri++ =
              (hex[0] >= 'A' ? ((hex[0] & 0xdf)-'A'+10) : (hex[0]-'0')) << 4
              | (hex[1] >= 'A' ? ((hex[1] & 0xdf)-'A'+10) : (hex[1]-'0'));
        }
        /* invalid hex encoded sequence */
        else {
            if (0) { /* FIXME: replace w/ StrictProtocol(?) directive check */
                return HTTP_BAD_REQUEST;
            }
            else {
                s++;
                *uri++ = '%';
                continue;
            }
        }
    }
    *uri = '\0';
    return OK;
}

Reply via email to