On 10/16/07, Shahar Evron <[EMAIL PROTECTED]> wrote:
> - Represent abstract or incomplete URIs (such as one might find in an
> HTML page - <a href="../foo/bar">.
Hi Shahar,
I think building URIs from only a relative path is a mistake -
conceptually and programmatically. A URI is not valid unless it has at
least a scheme.
However, it does make sense to allow constructing a URI from a URI
object and a relative path. Meaning you retrieve a context URI
representing the HTML page and then construct a new URI object with it
and the relative path '../foo/bar' [1].
That's how Java's URL class does it BTW.
Mike
[1] Beware that path canonicalization is notoriously tricky and has in
many instances lead to security vulnerabilities in high profile
products. For a good path canonicalization routine a state machine
usually turns out to be most correct (and if it's not it's easy to fix
without screwing something else up). The following is C but of course
it's not terribly difficult to translate this to PHP (e.g. *dst++ =
*src++ becomes dst[di++] = src[si++]).
#define ST_START 1
#define ST_SEPARATOR 2
#define ST_NORMAL 3
#define ST_DOT1 4
#define ST_DOT2 5
int
path_canon(const str_t *src, const str_t *slim,
str_t *dst, str_t *dlim,
int srcsep, int dstsep)
{
str_t *start = dst, *prev;
int state = ST_START;
while (src < slim && dst < dlim) {
switch (state) {
case ST_START:
state = ST_SEPARATOR;
if (*src == srcsep) {
*dst++ = dstsep; src++;
break;
}
case ST_SEPARATOR:
if (*src == '\0') {
*dst = '\0';
return dst - start;
} else if (*src == srcsep) {
src++;
break;
} else if (*src == '.') {
state = ST_DOT1;
} else {
state = ST_NORMAL;
}
*dst++ = *src++;
break;
case ST_NORMAL:
if (*src == '\0') {
*dst = '\0';
return dst - start;
} else if (*src == srcsep) {
state = ST_SEPARATOR;
*dst++ = dstsep; src++;
break;
}
*dst++ = *src++;
break;
case ST_DOT1:
if (*src == '\0') {
dst--;
*dst = '\0';
return dst - start;
} else if (*src == srcsep) {
state = ST_SEPARATOR;
dst--;
break;
} else if (*src == '.') {
state = ST_DOT2;
*dst++ = *src++;
break;
}
state = ST_NORMAL;
*dst++ = *src++;
break;
case ST_DOT2:
if (*src == '\0' || *src == srcsep) {
/* note src is not advanced in this case */
state = ST_SEPARATOR;
dst -= 2;
prev = dst - 1;
if (dst == start || prev == start) {
break;
}
do {
dst--;
prev = dst - 1;
} while (dst > start && *prev != dstsep);
break;
}
state = ST_NORMAL;
*dst++ = *src++;
break;
}
}
PMNO(errno = ERANGE);
return -1;
}
--
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/