On Wed, 2019-07-17 at 15:23 +0600, Denis Malyshkin wrote:
> Hello,
> 
> After upgrade to HttpClient version 4.5.8+ we encountered that
> requests
> with Cyrillic characters are broken. Below is the simple test to
> expose the
> issue with HttpClient version 4.5.8:
> ===================================
> public void cyrillicSymbolsExtraTest() throws Exception {
>   String urlStr = "http://google.com/кириллица-2019/?q=кириллица-2019
> ";
>   URL url = new URL(urlStr);
>   HttpUriRequest req = new HttpGet(url.toString());
> 
>   // Prints "
> 
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> "
>   System.out.println(req.getRequestLine().getUri());
> 
>   HttpClientContext context = HttpClientContext.create();
>   HttpClient client = HttpClients.custom().build();
>   HttpResponse resp = client.execute(req, context);
> 
>   Assert.assertEquals(req.getRequestLine().getUri(), "
> http://google.com"; +
> context.getRequest().getRequestLine().getUri());
> // Expected :
> 
http://google.com/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> // Actual   :
> 
http://google.com/:8@8;;8F0-2019/?q=%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0-2019
> }
> ===================================
> 
> With HttpClient 4.5.7 the test is passed correctly.
> 
> Yes, I know that non-ASCII codes aren't allowed in URLs. But I worry
> about
> the next things in the listed above behavior:
> 
> 1. req.getRequestLine().getUri() returns the correctly URL-Encoded
> URI, but
> the request is sent to an address with an incorrect path -- "
> http://google.com/:8@8;;8F0-2019/";.
> 
> 2. If the URL is incorrect it seems very weird to me to send the
> request to
> a broken URL instead of returning an error.
> 
> 3. There is an inconsistency between the encoding of the URL path
> part and
> the URL query part -- the path part becomes broken while the query
> part is
> correctly URL-encoded.
> 

This is a classic case of "garbage in - garbage out" rule. Please do
not use invalid characters in URI components.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscr...@hc.apache.org
For additional commands, e-mail: httpclient-users-h...@hc.apache.org

Reply via email to