On 10/16/07, Shahar Evron <[EMAIL PROTECTED]> wrote:
> - Represent abstract or incomplete URIs (such as one might find in an
>   HTML page - <a href="../foo/bar">.

Hi Shahar,

I think building URIs from only a relative path is a mistake -
conceptually and programmatically. A URI is not valid unless it has at
least a scheme.

However, it does make sense to allow constructing a URI from a URI
object and a relative path. Meaning you retrieve a context URI
representing the HTML page and then construct a new URI object with it
and the relative path '../foo/bar' [1].

That's how Java's URL class does it BTW.

Mike

[1] Beware that path canonicalization is notoriously tricky and has in
many instances lead to security vulnerabilities in high profile
products. For a good path canonicalization routine a state machine
usually turns out to be most correct (and if it's not it's easy to fix
without screwing something else up). The following is C but of course
it's not terribly difficult to translate this to PHP (e.g. *dst++ =
*src++ becomes dst[di++] = src[si++]).

#define ST_START     1
#define ST_SEPARATOR 2
#define ST_NORMAL    3
#define ST_DOT1      4
#define ST_DOT2      5

int
path_canon(const str_t *src, const str_t *slim,
        str_t *dst, str_t *dlim,
        int srcsep, int dstsep)
{
    str_t *start = dst, *prev;
    int state = ST_START;

    while (src < slim && dst < dlim) {
        switch (state) {
            case ST_START:
                state = ST_SEPARATOR;
                if (*src == srcsep) {
                    *dst++ = dstsep; src++;
                    break;
                }
            case ST_SEPARATOR:
                if (*src == '\0') {
                    *dst = '\0';
                    return dst - start;
                } else if (*src == srcsep) {
                    src++;
                    break;
                } else if (*src == '.') {
                    state = ST_DOT1;
                } else {
                    state = ST_NORMAL;
                }
                *dst++ = *src++;
                break;
            case ST_NORMAL:
                if (*src == '\0') {
                    *dst = '\0';
                    return dst - start;
                } else if (*src == srcsep) {
                    state = ST_SEPARATOR;
                    *dst++ = dstsep; src++;
                    break;
                }
                *dst++ = *src++;
                break;
            case ST_DOT1:
                if (*src == '\0') {
                    dst--;
                    *dst = '\0';
                    return dst - start;
                } else if (*src == srcsep) {
                    state = ST_SEPARATOR;
                    dst--;
                    break;
                } else if (*src == '.') {
                    state = ST_DOT2;
                    *dst++ = *src++;
                    break;
                }
                state = ST_NORMAL;
                *dst++ = *src++;
                break;
            case ST_DOT2:
                if (*src == '\0' || *src == srcsep) {
                        /* note src is not advanced in this case */
                    state = ST_SEPARATOR;
                    dst -= 2;
                    prev = dst - 1;
                    if (dst == start || prev == start) {
                        break;
                    }
                    do {
                        dst--;
                        prev = dst - 1;
                    } while (dst > start && *prev != dstsep);
                    break;
                }
                state = ST_NORMAL;
                *dst++ = *src++;
                break;
        }
    }

    PMNO(errno = ERANGE);
    return -1;
}

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/

Reply via email to