[
https://issues.apache.org/jira/browse/HTTPCLIENT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939511#comment-13939511
]
William Porter commented on HTTPCLIENT-1486:
--------------------------------------------
I am also of the opinion that implementing a more lenient URI parse is not what
we want. I think the issue is that the current URI parsing is actually too
lenient. The internal java.net.URI representation is unable to parse a host
and port from that URI, but the org.apache.http.client.utils.URIUtils
extractHost() has custom parsing logic that leads to this scenario. Rather
than parsing a host and port through custom means, maybe an exception should be
thrown.
This method is already parsing the URI more than once in this code (in
URIUtils.java extractHost(final URI uri)):
{code}
298 String host = uri.getHost();
299 if (host == null) { // normal parse failed; let's do it ourselves
300 // authority does not seem to care about the valid
character-set for host names
301 host = uri.getAuthority();
302 if (host != null) {
303 // Strip off any leading user credentials
304 final int at = host.indexOf('@');
305 if (at >= 0) {
306 if (host.length() > at+1 ) {
307 host = host.substring(at+1);
308 } else {
309 host = null; // @ on its own
310 }
311 }
312 // Extract the port suffix, if present
313 if (host != null) {
314 final int colon = host.indexOf(':');
315 if (colon >= 0) {
316 final int pos = colon + 1;
317 int len = 0;
318 for (int i = pos; i < host.length(); i++) {
319 if (Character.isDigit(host.charAt(i))) {
320 len++;
321 } else {
322 break;
323 }
324 }
325 if (len > 0) {
326 try {
327 port =
Integer.parseInt(host.substring(pos, pos + len));
328 } catch (final NumberFormatException ex) {
329 }
330 }
331 host = host.substring(0, colon);
332 }
333 }
334 }
335 }
...
{code}
> Quirky Behavior in URIUtils leads to Improper Request Execution
> ---------------------------------------------------------------
>
> Key: HTTPCLIENT-1486
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1486
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 4.3.3
> Reporter: William Porter
> Priority: Minor
>
> While executing a HttpUriRequest with a ClosableHttpClient, malformed URIs
> can lead to HTTP requests being executed for unexpected resources. The root
> issue is in the extractHost() method in URIUtils, and is demonstracted by the
> following example.
> {code:title=Main.java|borderStyle=solid}
> import java.io.IOException;
> import java.net.URI;
> import java.net.URISyntaxException;
> import org.apache.http.HttpHost;
> import org.apache.http.HttpResponse;
> import org.apache.http.client.ClientProtocolException;
> import org.apache.http.client.HttpClient;
> import org.apache.http.client.methods.HttpGet;
> import org.apache.http.client.methods.HttpUriRequest;
> import org.apache.http.client.utils.URIUtils;
> import org.apache.http.impl.client.HttpClientBuilder;
> import org.apache.log4j.BasicConfigurator;
> import org.junit.Assert;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> public class Main {
>
> private static final Logger LOG = LoggerFactory.getLogger(Main.class);
> public static void main(String [] args) {
>
> // Set up Log4J logging
> BasicConfigurator.configure();
>
> try {
>
> // The following is a strange URI string that is
> possibly a typo that
> // doesn't include the / between the authority and the
> 'intended' path
> final String strangeUriString =
> "http://www.example.com:80somepath/someresource.html";
> // Whereas it doesn't neccesarily seem like strange
> behavior to resolve the
> // host and port as www.example.com and 80 from the
> authority, it can have unintended
> // consequences at higher levels of indirection
> Assert.assertEquals(new HttpHost("www.example.com",
> 80), URIUtils.extractHost(new URI(strangeUriString)));
>
> // Now we construct a request with the strange URI
> String
> HttpUriRequest request = new HttpGet(strangeUriString);
>
> // We create a CloseableHttpClient to execute the
> request
> final HttpClientBuilder builder =
> HttpClientBuilder.create();
> HttpClient client = builder.build();
>
> // Here, the request is executed, but is actually a GET
> /someresource.html
> // on www.example.com:80 since part of the intended
> path was considered part
> // of the authority by the URI class, but disregarded
> by URIUtils
> final HttpResponse response = client.execute(request);
> LOG.info("Response: {}",
> response.getStatusLine().toString());
>
>
> } catch (final URISyntaxException e) {
> LOG.error("UriSyntaxException: {}", e.getMessage());
> } catch (final ClientProtocolException e) {
> LOG.error("ClientProtocolException: {}",
> e.getMessage());
> } catch (final IOException e) {
> LOG.error("IOException: {}", e.getMessage());
> }
>
> }
> }
> {code}
> This bug may be introduced by the fix for
> https://issues.apache.org/jira/browse/HTTPCLIENT-1166. It might be
> advantageous to throw an exception in this case rather than be lenient with
> the host and port parsing, but further discussion might be merited based on
> the comments in the aforementioned issue.
> Here is some debug output to show the request is actually a GET
> /someresource.html
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "GET
> /someresource.html HTTP/1.1[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Host:
> www.example.com:80[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Connection:
> Keep-Alive[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "User-Agent:
> Apache-HttpClient/4.3.3 (java 1.5)[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "Accept-Encoding:
> gzip,deflate[\r][\n]"
> 87 [main] DEBUG org.apache.http.wire - http-outgoing-0 >> "[\r][\n]"
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]