Again, it's maxBytes per java "char", not maxBytes per Unicode character.
Allocating a big enough buffer is pretty much the only reason for maxBytesPerChar' existence. On Tue, Sep 23, 2014 at 7:58 AM, Salter, Thomas A <thomas.sal...@unisys.com> wrote: > This response confuses me. Are you saying that the UTF8 encoder is not > really producing UTF8? RFC 2279 and 3629 both clearly state that > surrogates must be combined to form a 32-bit value which is then encoded as > a 4-byte sequence. In fact, the RFCs refer to the alternate encoding > CESU_8 definition which encodes each half of the surrogate pair as a 3-byte > UTF-8 sequence. > > I guess returning 3.0 for maxBytesPerChar works for the purpose of > allocating a big enough byte buffer, but the explanation in this thread is > confusing. > > Tom Salter > > ------------------------------ > Date: Tue, 23 Sep 2014 11:37:07 +0400 > From: Ivan Gerasimov <ivan.gerasi...@oracle.com> > To: Xueming Shen <xueming.s...@oracle.com>, Martin Buchholz > <marti...@google.com> > Cc: nio-...@openjdk.java.net, core-libs-dev > <core-libs-dev@openjdk.java.net> > Subject: Re: RFR [8058875]: CharsetEncoder.maxBytesPerChar() should > return 4 for UTF-8 > Message-ID: <54212323.5080...@oracle.com> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Martin, Sherman thanks for clarification! > > Closing the bug as not a bug. > > > The "character" in the nio Charset and CharDe/Encoder is specified as > > "sixteen-bit Unicode > > code unit", so it is reasonable to interpret the "character" in the > > "maximum number of bytes > > that will be produced for each character of input" to be the Java > > "char" as well. In case of > > UTF8, each 4-byte form supplementary character is always coded into 2 > > surrogate chars, > > it's "2 byte per char". > > > Do we have a real escalation that complains about this? > > > Yes, the link in on the bug page: > https://bugs.openjdk.java.net/browse/JDK-8058875 > I'm going to try to explain what I've just realized about this function :-) > > Sincerely yours, > Ivan > >