Johan Compagner created HTTPCLIENT-2244:
-------------------------------------------
Summary: default response encoding is US.ASCII but it should be
ISO-8859-1
Key: HTTPCLIENT-2244
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2244
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient (async)
Affects Versions: 5.1.3
Reporter: Johan Compagner
here (and in the getBodyBytes() above ):
[httpcomponents-client/SimpleBody.java at master · apache/httpcomponents-client
(github.com)|https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/async/methods/SimpleBody.java#L86]
you see that the default charset to read a body is StandardCharsets.US_ASCII
which is not correct that should be StandardCharsets.ISO_8859_1
this was the case in HC 4.x also described in the docs:
[https://hc.apache.org/httpclient-legacy/charencodings.html]
i know the server should specify the encoding because of the different
interpretations, but we don't always control the server and what they are
doing, but according to the spec:
[https://www.w3.org/International/articles/http-charset/index]
the default should be that ISO_8859_1
Now clients of us that are using this suddenly see that german umlauts are not
transferred correctly which with HC4 they worked fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]