Nicholas O'Connor created HTTPCLIENT-2363:
---------------------------------------------

             Summary: execute(HttpHost, HttpRequest, ResponseHandler) adds port 
to Host header while execute(HttpRequest, ResponseHandler) does not
                 Key: HTTPCLIENT-2363
                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363
             Project: HttpComponents HttpClient
          Issue Type: Bug
          Components: HttpClient (classic)
    Affects Versions: 5.4.2, 5.3.1
            Reporter: Nicholas O'Connor


I've found what I think is a bug, but could also be expected behavior that's 
surprising from the user's perspective.

[https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9]

Expected behavior: assume we have some URI 
{{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides overloads 
for execute that allow the URI to be split into host and path 
components("{{{}[https://www.example.com|https://www.example.com/]{}}}";, 
"{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where 
{{{}request.getAuthority({}}}) is 
"[{{https://example.com}}|https://example.com/]"; and {{request.getUri()}} is 
"/some/path"). Using either of these two methods provides the exact same result.

 

Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the 
Host header to be [{{www.example.com:443}}|http://www.example.com:443/], while 
{{execute(HttpRequest, ResponseHandler)}} sets it to 
[{{www.example.com}}|http://www.example.com/].

 

Normally, this behavior has no effect. In fact, 
[https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip 
the port in the Host header when echoing back the headers in a request. 
However, I've recently come across a server that rejected some requests with 
"Invalid host header, this site must be accessed as 
[https://www.example.com|https://www.example.com/]";. Investigation revealed 
that it rejected requests where the port was included in the Host header, and 
would only accept requests where a port was not defined.

 

This behavior is not defined by the HTTP spec; the port number is not required 
in the Host header sent by the client, nor is the server obligated to respect 
the host portion without the port. This case feels like an outlier from usual 
behavior; however, this hidden behavior from {{HttpClient}} was unexpected.

 

It appears that this happens when {{{}ProtocolExec{}}}, 
{{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the 
authority and scheme for a request if it didn't have one to begin with. Because 
they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this host also 
contains port information (usually scheme-default) when it is set as the 
request's authority.

 

This bug is very easily worked around by simply setting the requests authority 
from the target before calling execute, but it still seems unusual. Was this 
behavior intended?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to