[
https://issues.apache.org/jira/browse/HTTPCLIENT-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072913#comment-15072913
]
Idan Sheinberg edited comment on HTTPCLIENT-1708 at 12/28/15 5:04 PM:
----------------------------------------------------------------------
Sure thing and thanks for your prompt response.
This is a method that tries to access an S3 JPEG URL which contains extended
ascii latin chars
notice the print outs and see how the url is decoded
public static void main(String[] args) throws IOException,
URISyntaxException {
HttpClient hc = HttpClientBuilder.create().build();
URI uriGoneBad = new
URI("http://uritest.s3.amazonaws.com/Dépot_Électros.jpg");
HttpHead headRequest = new HttpHead(uriGoneBad);
System.out.println(uriGoneBad.toString());
System.out.println(uriGoneBad.toASCIIString());
System.out.println(headRequest.getRequestLine());
try (CloseableHttpResponse httpResponse = (CloseableHttpResponse)
hc.execute(headRequest)) {
if (httpResponse != null) {
System.out.println(httpResponse.getStatusLine().getStatusCode());
}
}
}
If you take the link and paste it in a browser , you can see it gets escaped
like this ( this uri will work with httpclient ):
http://uritest.s3.amazonaws.com/De%CC%81pot_E%CC%81lectros.jpg
if you run URLDecode.decode("Dépot_Électros.jpg" , "UTF-8") , you will get
the same escape patterns .
as noted in the description , the URI object uses the SUN "Normalizer" class
internally which causes character manipulation
was (Author: idans):
Sure thing and thanks for your prompt response.
This is a method that tries to access an S3 JPEG URL which contains extended
ascii latin chars
notice the print outs and see how the url is decoded
public static void main(String[] args) throws IOException,
URISyntaxException {
HttpClient hc = HttpClientBuilder.create().build();
URI uriGoneBad = new
URI("http://uritest.s3.amazonaws.com/Dépot_Électros.jpg");
HttpHead headRequest = new HttpHead(uriGoneBad);
System.out.println(uriGoneBad.toString());
System.out.println(uriGoneBad.toASCIIString());
System.out.println(headRequest.getRequestLine());
try (CloseableHttpResponse httpResponse = (CloseableHttpResponse)
hc.execute(headRequest)) {
if (httpResponse != null) {
System.out.println(httpResponse.getStatusLine().getStatusCode());
}
}
}
If you take the link and paste it in a browser , you can see it gets escaped
like this ( this uri will work with httpclient ):
http://uritest.s3.amazonaws.com/De%CC%81pot_E%CC%81lectros.jpg
if you run URLDecode.decode("Dépot_Électros.jpg" , "UTF-8") , you will get
the same escape patterns .
as noted in the description , the URI object uses the SUN "Normalizer" class
internally which causes character manipulation
> Issues With Extended ASCII ( Latin Chars ) Escaping with HttpClient 4.5.1
> -------------------------------------------------------------------------
>
> Key: HTTPCLIENT-1708
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1708
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient
> Affects Versions: 4.5.1
> Environment: Ubuntu Variant 15.04
> Netbeans 8.1 IDE
> Java 8 ( Oracle JDK 1.8.6x)
> Reporter: Idan Sheinberg
>
> Hey Guys
> Trying to send an httpclient HttpHead request for the following url -
> 'http://some.domain.com/amnetcanadaplatform/HomeDépot_Électros_WEB22s_video.mp4'
> I Notice it gets escaped as
> 'http://some.domain.com/amnetcanadaplatform/HomeD%C3%A9pot_%C3%89lectros_WEB22s_video.mp4'
> While other programs/utilities/frameworks expect it to be
> 'http://some.domain.com/amnetcanadaplatform/HomeDe%CC%81pot_E%CC%81lectros_WEB22s_video.mp4'
> I've done some digging up the source code and tracked the issue down to
> "toASCIIString()" of the Java URI ( WHICH OF COURSE IS NOT YOUR
> RESPONSIBILITY )object being called in order to retrieve the request line
> Class : org.apache.http.client.methods.HttpRequestWrapper
> Method : getRequestLine()
> Line : 113
> Internally the line 'String ns = Normalizer.normalize(s,
> Normalizer.Form.NFC);' manipulates the chars so their unicode value changes ,
> which causes the 'inappropriate' values to be displayed for the escaped URI
> Class : java.net.URI
> Method : encode(String s)
> Line : 2723
> Now I know this would extra hard to even get to a point where I find out if
> this is a java language issue , but I don't believe unicode manipulation of
> bytes is a desired behavior for url encoding. Is there any known specific
> why you used the "toASCIIString()" method instead of the plain "toString()"
> method
> Do you think there's a chance this issue can be resolved on your end ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]