URI Absolutization does not follow browser behavior
---------------------------------------------------

                 Key: HTTPCLIENT-679
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-679
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient
    Affects Versions: 3.1 RC1
         Environment: HttpClient 3.1 RC1, 
JDK 1.6.0
Ubuntu 7.04
            Reporter: Jeff Dalton


This was encountered using Heritrix to crawl a prominent website.

The URI resulting from the HttpClient URI constructor (base, relative) does not 
follow browser behavior:
URI newUrl = new URI(new 
URI("http://www.theirwebsite.com/browse/results?type=browse&att=1";), 
"?sort=0&offset=11&pageSize=10")

Results in newUrl:
http://www.theirwebsite.com/browse/?sort=0&offset=11&pageSize=10

The desired behavior based on Firefox and IE should be:
http://www.theirwebsite.com/browse/results?sort=0&offset=11&pageSize=10

These browsers treat the question mark similar to a directory separator and do 
not require a file to be specified before the query.

HttpClient's current behavior does not correspond to current browser behavior 
and leads to an inability to crawl certain websites if HttpClient's URI class 
is used.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to