[ 
http://issues.apache.org/jira/browse/HTTPCLIENT-587?page=comments#action_12416592
 ] 

Gordon Mohr commented on HTTPCLIENT-587:
----------------------------------------

> What's wrong with the JDK URI class?

(a) It still has bugs where it fails to implement the spec at well as 
httpclient.URI. One recent example, still a problem in current JDK 1.6 betas:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4708535

java.net.URI base = new java.net.URI("http://www.example.com/some/page";);
java.net.URI rel = new java.net.URI("");
java.net.URI derel = base.resolve(rel);
derel.toString();
(java.lang.String) http://www.example.com/some/   // INCORRECT

org.apache.commons.httpclient.URI base = new 
org.apache.commons.httpclient.URI("http://www.example.com/some/page";);
org.apache.commons.httpclient.URI rel = new 
org.apache.commons.httpclient.URI("");
org.apache.commons.httpclient.URI derel = new 
org.apache.commons.httpclient.URI(base,rel);
derel.toString();
(java.lang.String) http://www.example.com/some/page  // CORRECT

(b) java.net.URI and its maintainers reject the idea that there should be any 
facility in the URI class for tolerating the same sorts of formal spec 
deviations often seen in real URIs and domain names. 

As one example, domain names with '_' are technically illegal but have often 
been tolerated by DNS-related software and we have run across functioning 
websites at subdomains with '_' in their name. Browsers browse these sites 
fine, so we want to be able to crawl them. java.net.URI can't help us.

Now of course, it's legitimate and useful to provide a class which regirously 
implements all written standards. Not everyone wants a class that also 
tolerates de facto practices. But that leads us to the ultimate problem with 
java.net.URI:

(c) java.net.URI licensing and language declarations make it resistant to reuse 
and adaptation to other legitimate uses

It's not open source and major portions of its implementation are 'private' or 
'final'. So it's impossible to reuse 99% of it (such as its various RFC syntax 
character-class definitions, fields, and working parsing code) while also 
either  patching the bugs like in (a) above or overriding the strictness which 
makes it unsuitable for some purposes like in (b) above. 

In comparison, the org.apache.commons.httpclient.URI class is friendly to 
subclassing (which we've used to work around bugs and change the behavior to 
better fit our problem domain) and if that didn't work ith respect to a bug, 
we'd at least have the option of patching it ourselves and redistributing the 
fix. 

So our project would very much miss the pretty-good (and at least serviceable 
when broken) httpclient.URI class if it were dropped in favor of the JDK 
java.net.URI class. 

> Have you looked at HttpCore?

Only a little. Until it has an official test release, and comes close to 
matching the HttpClient facilities for cookies, URIs,  etc., it probably won't 
be suitable to replace our HttpClient 3.x use.

(The ability to issue unvalidated request strings would be useful -- but we've 
already patched this into HttpClient 3.x to the extent we need it. Also, we 
still need to perform best-effort, highly-tolerant parsing of URIs into their 
traditional constituent parts for various decisions and kinds of analysis.)

> derelativizing of relative URIs with a scheme is incorrect
> ----------------------------------------------------------
>
>          Key: HTTPCLIENT-587
>          URL: http://issues.apache.org/jira/browse/HTTPCLIENT-587
>      Project: Jakarta HttpClient
>         Type: Bug

>     Versions: 3.0.1
>     Reporter: Gordon Mohr

>
> URI constructor "public URI(URI base, URI relative) throws URIException" 
> assumes that if given 'relative' URI has a scheme, it should provide an 
> authority and complete path to the constructed URI. However, a URI can have a 
> scheme but still be relative, requiring the authority and base path of the 
> 'base' URI. 
> Demonstration code:
> URI base = new URI("http://www.example.com/some/page";);
> URI rel = new URI("http:boo");
> URI derel = new URI(base,rel);
> derel.toString();
> (java.lang.String) http:boo
> In fact, derel should be "http://www.example.com/some/boo";. 
> RFC2396 is a little confused about this; section 3.1 states ""Relative URI 
> references are distinguished from absolute URI in that they do not begin with 
> a scheme name." But, in section 5, there are several sentences talking about 
> relative URIs that begin with schemes (and how this prevents using relative 
> URIs that have leading path segments that look like scheme identifiers). 
> RFC3896, which supercedes RFC2396, removes the implication a relative URI 
> cannot begin with a scheme, leaving the other text explcitly discussing 
> relative URIs with schemes. 
> Both Firefox (1.5) and IE (6.0) treat "http:boo" the same as "boo" for 
> purposes of derelativization against an HTTP base URI, which would give the 
> final URI "http://www.example.com/some/boo"; in the example above. 
> Even relative URIs like "http:../../boo" are explicitly legal. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to