Joakim Erdfelt created HTTPCLIENT-2400:
------------------------------------------
Summary: URLEncodedUtils.encodeFormFields() has incorrect javadoc
Key: HTTPCLIENT-2400
URL: https://issues.apache.org/jira/browse/HTTPCLIENT-2400
Project: HttpComponents HttpClient
Issue Type: Bug
Components: HttpClient (classic)
Affects Versions: 5.5.1
Reporter: Joakim Erdfelt
The javadoc for URLEncodedUtils.encodeFormFields() says ...
> Encode/escape www-url-form-encoded content.
> Uses the URLENCODER set of characters, rather than the UNRESERVED set; this
> is for
> compatibilty with previous releases, URLEncoder.encode() and most browsers.
This method is not compatible with URLEncoder.encode() with non-UTF-8 charsets.
If we take a Japanese character and encode it with
URLEncodedUtils.encodeFormFields() and again with URLEncoder.encode() we get
different results.
I'll use the following letter ...
KATAKANA LETTER HO: ホ
https://unicodeplus.com/U+30DB
Using URLEncodedUtils.encodeFormFields("ホ", Charset.forName("Shift_JIS"))
Result: "%83z"
Using java's URLEncoder.encode("ホ", Charset.forName("Shift_JIS"))
Result: "%83%7B"
The result from URLEncoder.encode() is actually correct, despite the "%7B"
being part of the UNRESERVED set.
Interestingly, if you attempt to use java's URLDecoder against the format
URLEncodedUtils produces you get replacement characters.
Example, with jshell ...
{code}
$ jshell
| Welcome to JShell -- Version 17.0.15
| For an introduction type: /help intro
jshell> var shiftJisCharset = java.nio.charset.Charset.forName("Shift-JIS")
shiftJisCharset ==> Shift_JIS
jshell> var result = URLEncoder.encode("ホ", shiftJisCharset)
result ==> "%83%7A"
jshell> var result = URLDecoder.decode("%83%7A", shiftJisCharset)
result ==> "ホ"
jshell> var result = URLDecoder.decode("%83z", shiftJisCharset)
result ==> "�z"
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]