On Wed, 2007-10-17 at 08:42 -0400, Michael B Allen wrote: > On 10/17/07, Shahar Evron <[EMAIL PROTECTED]> wrote: > > Hi Michael, > > > > Generally speaking you are right - a URI without a scheme is invalid, > > but I don't think this is a good enough reason not to represent partial > > URLs in an object. > > The added complexity of allowing virtually any input is going to come > back to haunt you. >
I'm not going to allow any output - but as far as parsing URLs is
concerned, I plan to rely on parse_url() which works quite well for
arbitrary URL-like strings according to my tests.
> > It will make it easier for users, for example, to parse HTML files which
> > might contain full or partial URLs, and extract them, without checking
> > whether they are complete or not. Then you could use this partial URI to
> > extract parts of it - like the path, fragment or query string.
>
> $page = HtmlDocument::parse($str);
> $context = $page->getContextUri();
> foreach ($page->getLinks() as $href) {
> $link = new Zend_Uri($context, $href);
> }
>
> If $href is a full URI it ignores $context.
>
Yes, a part of my proposal suggests a similar use case - but I am not
sure what the 'HtmlDocument' class is. In any case, unless there
document state's a base URL, usually you can't tell the context URI from
the HTML document - you're supposed to know it in advance if you fetched
it, but if you didn't, you just can't know.
In any case I am still not convinced there is a good reason to disallow
partial URLs - from my experience, at some point people will complain
about us being too strict, and we will end up changing this. The web is
full of non-standard use cases, and I'm sure we will encounter some
sooner or later. We should be strict and standard-compliant with the
things we generate - but that doesn't mean we can't be loose in what we
accept as input.
Shahar.
> Java may have some bloated libraries but Sun's URL class seems well
> designed to me.
>
> Mike
signature.asc
Description: This is a digitally signed message part
