[
https://issues.apache.org/jira/browse/HTTPCLIENT-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517586
]
Gordon Mohr commented on HTTPCLIENT-679:
----------------------------------------
Notably, the browsers are following RFC3986. Taking an example from RFC3986
section 5.4.1 ("Normal Examples"):
URI uri = new URI(new URI("http://a/b/c/d;p?q"), "?y");
uri.toString(); // is "http://a/b/c/?y"; by RFC3986 should be
"http://a/b/c/d;p?y"
> URI Absolutization does not follow browser behavior
> ---------------------------------------------------
>
> Key: HTTPCLIENT-679
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-679
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 3.1 RC1
> Environment: HttpClient 3.1 RC1,
> JDK 1.6.0
> Ubuntu 7.04
> Reporter: Jeff Dalton
>
> This was encountered using Heritrix to crawl a prominent website.
> The URI resulting from the HttpClient URI constructor (base, relative) does
> not follow browser behavior:
> URI newUrl = new URI(new
> URI("http://www.theirwebsite.com/browse/results?type=browse&att=1"),
> "?sort=0&offset=11&pageSize=10")
> Results in newUrl:
> http://www.theirwebsite.com/browse/?sort=0&offset=11&pageSize=10
> The desired behavior based on Firefox and IE should be:
> http://www.theirwebsite.com/browse/results?sort=0&offset=11&pageSize=10
> These browsers treat the question mark similar to a directory separator and
> do not require a file to be specified before the query.
> HttpClient's current behavior does not correspond to current browser behavior
> and leads to an inability to crawl certain websites if HttpClient's URI class
> is used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]