Hello,

After upgrade to HttpClient version 4.5.8+ we encountered that requests
with Cyrillic characters are broken. Below is the simple test to expose the
issue with HttpClient version 4.5.8:
===================================
public void cyrillicSymbolsExtraTest() throws Exception {
  String urlStr = "http://google.com/кириллица-2019/?q=кириллица-2019";;
  URL url = new URL(urlStr);
  HttpUriRequest req = new HttpGet(url.toString());

  // Prints "
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
"
  System.out.println(req.getRequestLine().getUri());

  HttpClientContext context = HttpClientContext.create();
  HttpClient client = HttpClients.custom().build();
  HttpResponse resp = client.execute(req, context);

  Assert.assertEquals(req.getRequestLine().getUri(), "http://google.com"; +
context.getRequest().getRequestLine().getUri());
// Expected :
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
// Actual   :
http://google.com/:8@8;;8F0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
}
===================================

With HttpClient 4.5.7 the test is passed correctly.

Yes, I know that non-ASCII codes aren't allowed in URLs. But I worry about
the next things in the listed above behavior:

1. req.getRequestLine().getUri() returns the correctly URL-Encoded URI, but
the request is sent to an address with an incorrect path -- "
http://google.com/:8@8;;8F0-2019/";.

2. If the URL is incorrect it seems very weird to me to send the request to
a broken URL instead of returning an error.

3. There is an inconsistency between the encoding of the URL path part and
the URL query part -- the path part becomes broken while the query part is
correctly URL-encoded.

Summarizing above it looks to me like a bug.


Thank you,
Denis Malyshkin.

Reply via email to