[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Kalnichevski resolved HTTPCLIENT-1990.
-------------------------------------------
    Resolution: Invalid

> URIUtils.rewriteURI manges unicode characters
> ---------------------------------------------
>
>                 Key: HTTPCLIENT-1990
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1990
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpCache
>    Affects Versions: 4.5.8
>            Reporter: Nicholas Wilson
>            Priority: Minor
>
> The following test case illustrates a problem with URIUtils that I have 
> encountered:
> {code:java}
> public class Main {
>   public static void main(String[] args) throws Exception {
>     URI uri = UriComponentsBuilder.fromUriString("https://host/path";)
>       .pathSegment("üñîçøðé")
>       .build()
>       .toUri();
>     System.out.printf("rawPath = %s\n", uri.getRawPath());
>     System.out.printf("path    = %s\n", uri.getPath());
>     uri = URIUtils.rewriteURI(uri, null, 
> URIUtils.DROP_FRAGMENT_AND_NORMALIZE);
>     System.out.printf("rawPath = %s\n", uri.getRawPath());
>     System.out.printf("path    = %s\n", uri.getPath());
>   }
> }
> {code}
> The issue was encontered, since previous versions of httpclient didn't 
> perform the path normalisation (the main caller is ProtocolExec in the HTTP 
> client), and effectively only did URIUtils.DROP_FRAGMENT, so users who 
> upgrade will get the new normalisation feature unexpectedly.
> The bug exhibited by URIUtils.rewriteURI is actually caused by 
> URLEncodedUtils.urlDecode (inside URIBuilder's ctor, which calls 
> URIBuilder.parsePath), which does something truly nasty. It takes a String (a 
> logical sequence of Unicode code points), casts it to a CharBuffer, then 
> iterates over it, slicing the chars to bytes! Strange, but true.
> Unicode characters in a java.net.URI are legal, as far as I can tell, and 
> should be simply escaped as percent-encoded UTF-8 bytes as returned by 
> URI.getRawPath - but! - not when returned unescaped by URI.getPath, which is 
> what URIUtils.rewriteURI uses.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to