On Wed, 2019-07-17 at 15:23 +0600, Denis Malyshkin wrote: > Hello, > > After upgrade to HttpClient version 4.5.8+ we encountered that > requests > with Cyrillic characters are broken. Below is the simple test to > expose the > issue with HttpClient version 4.5.8: > =================================== > public void cyrillicSymbolsExtraTest() throws Exception { > String urlStr = "http://google.com/кириллица-2019/?q=кириллица-2019 > "; > URL url = new URL(urlStr); > HttpUriRequest req = new HttpGet(url.toString()); > > // Prints " > http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019 > " > System.out.println(req.getRequestLine().getUri()); > > HttpClientContext context = HttpClientContext.create(); > HttpClient client = HttpClients.custom().build(); > HttpResponse resp = client.execute(req, context); > > Assert.assertEquals(req.getRequestLine().getUri(), " > http://google.com" + > context.getRequest().getRequestLine().getUri()); > // Expected : > http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019 > // Actual : > http://google.com/:8@8;;8F0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019 > } > =================================== > > With HttpClient 4.5.7 the test is passed correctly. > > Yes, I know that non-ASCII codes aren't allowed in URLs. But I worry > about > the next things in the listed above behavior: > > 1. req.getRequestLine().getUri() returns the correctly URL-Encoded > URI, but > the request is sent to an address with an incorrect path -- " > http://google.com/:8@8;;8F0-2019/". > > 2. If the URL is incorrect it seems very weird to me to send the > request to > a broken URL instead of returning an error. > > 3. There is an inconsistency between the encoding of the URL path > part and > the URL query part -- the path part becomes broken while the query > part is > correctly URL-encoded. >
This is a classic case of "garbage in - garbage out" rule. Please do not use invalid characters in URI components. Oleg --------------------------------------------------------------------- To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org For additional commands, e-mail: httpclient-users-h...@hc.apache.org