Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

Xueming Shen Mon, 22 Sep 2014 14:34:51 -0700

On 09/22/2014 01:14 PM, Ivan Gerasimov wrote:

Hello!


The UTF-8 encoding allows characters that are 4 bytes long.
However, CharsetEncoder.maxBytesPerChar() currently returns 3.0, which is not 
always enough.

Would you please review the simple fix for this issue?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8058875
WEBREV: http://cr.openjdk.java.net/~igerasim/8058875/0/webrev/

Sincerely yours,
Ivan


The "character" in the nio Charset and CharDe/Encoder is specified as 
"sixteen-bit Unicode
code unit", so it is reasonable to interpret the "character" in the "maximum 
number of bytes
that will be produced for each character of input" to be the Java "char" as 
well. In case of
UTF8, each 4-byte form supplementary character is always coded into 2 surrogate 
chars,
it's "2 byte per char". Do we have a real escalation that complains about this?

-Sherman

Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should return 4 for UTF-8

Reply via email to