[ 
http://issues.apache.org/jira/browse/HTTPCLIENT-587?page=comments#action_12416562
 ] 

Gordon Mohr commented on HTTPCLIENT-587:
----------------------------------------

For all of 3.0.1 URI's problems, it's better than the Sun class.

Is the next version's URI class under development and verified to avoid this 
problem? 

We are still using the 3.x HttpClient in production systems (web crawling) , 
since no later releases are officially available. This issue showed up in 
several real crawls -- in the usual case (where the page author made a mistake, 
for example "http:www.example.com") the impact is low but there is a risk of 
important, compliant HREFs not being followed. 

I will try to work up a patch. 

> derelativizing of relative URIs with a scheme is incorrect
> ----------------------------------------------------------
>
>          Key: HTTPCLIENT-587
>          URL: http://issues.apache.org/jira/browse/HTTPCLIENT-587
>      Project: Jakarta HttpClient
>         Type: Bug

>     Versions: 3.0.1
>     Reporter: Gordon Mohr

>
> URI constructor "public URI(URI base, URI relative) throws URIException" 
> assumes that if given 'relative' URI has a scheme, it should provide an 
> authority and complete path to the constructed URI. However, a URI can have a 
> scheme but still be relative, requiring the authority and base path of the 
> 'base' URI. 
> Demonstration code:
> URI base = new URI("http://www.example.com/some/page";);
> URI rel = new URI("http:boo");
> URI derel = new URI(base,rel);
> derel.toString();
> (java.lang.String) http:boo
> In fact, derel should be "http://www.example.com/some/boo";. 
> RFC2396 is a little confused about this; section 3.1 states ""Relative URI 
> references are distinguished from absolute URI in that they do not begin with 
> a scheme name." But, in section 5, there are several sentences talking about 
> relative URIs that begin with schemes (and how this prevents using relative 
> URIs that have leading path segments that look like scheme identifiers). 
> RFC3896, which supercedes RFC2396, removes the implication a relative URI 
> cannot begin with a scheme, leaving the other text explcitly discussing 
> relative URIs with schemes. 
> Both Firefox (1.5) and IE (6.0) treat "http:boo" the same as "boo" for 
> purposes of derelativization against an HTTP base URI, which would give the 
> final URI "http://www.example.com/some/boo"; in the example above. 
> Even relative URIs like "http:../../boo" are explicitly legal. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to