Nicholas O'Connor created HTTPCLIENT-2363: ---------------------------------------------
Summary: execute(HttpHost, HttpRequest, ResponseHandler) adds port to Host header while execute(HttpRequest, ResponseHandler) does not Key: HTTPCLIENT-2363 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2363 Project: HttpComponents HttpClient Issue Type: Bug Components: HttpClient (classic) Affects Versions: 5.4.2, 5.3.1 Reporter: Nicholas O'Connor I've found what I think is a bug, but could also be expected behavior that's surprising from the user's perspective. [https://gist.github.com/Earth-Turtle/c39c5282af1c8a306099e89091fafea9] Expected behavior: assume we have some URI {{{}[https://www.example.com/some/path]{}}}. {{HttpClient}} provides overloads for execute that allow the URI to be split into host and path components("{{{}[https://www.example.com|https://www.example.com/]{}}}", "{{{}/some/path{}}}"), or provided all in the same {{HttpRequest}} (where {{{}request.getAuthority({}}}) is "[{{https://example.com}}|https://example.com/]" and {{request.getUri()}} is "/some/path"). Using either of these two methods provides the exact same result. Actual behavior: {{execute(HttpHost, HttpRequest, ResponseHandler)}} sets the Host header to be [{{www.example.com:443}}|http://www.example.com:443/], while {{execute(HttpRequest, ResponseHandler)}} sets it to [{{www.example.com}}|http://www.example.com/]. Normally, this behavior has no effect. In fact, [https://echo.free.beeceptor.com|https://echo.free.beeceptor.com/] will strip the port in the Host header when echoing back the headers in a request. However, I've recently come across a server that rejected some requests with "Invalid host header, this site must be accessed as [https://www.example.com|https://www.example.com/]". Investigation revealed that it rejected requests where the port was included in the Host header, and would only accept requests where a port was not defined. This behavior is not defined by the HTTP spec; the port number is not required in the Host header sent by the client, nor is the server obligated to respect the host portion without the port. This case feels like an outlier from usual behavior; however, this hidden behavior from {{HttpClient}} was unexpected. It appears that this happens when {{{}ProtocolExec{}}}, {{{}AsyncProtocolExec{}}}, and {{MinimalHttpClient}} are filling in the authority and scheme for a request if it didn't have one to begin with. Because they fill from the {{{}HttpRoute{}}}'s target {{{}HttpHost{}}}, this host also contains port information (usually scheme-default) when it is set as the request's authority. This bug is very easily worked around by simply setting the requests authority from the target before calling execute, but it still seems unusual. Was this behavior intended? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org