Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Martin Buchholz Thu, 08 Feb 2018 14:21:48 -0800

Instead of saying something specific about char[], String etc you can be
more general and say something about the entire Java Platform.


"The Java Platform generally processes human language text represented as
sequences of {@code char} values in the UTF-16 encoding of Unicode"

But doing this well would involve a greater overhaul of the class doc.

Character will always be confusing because it is ambiguous about whether
referring to a Unicode character or a UTF-16 code unit.

On Thu, Feb 8, 2018 at 10:59 AM, joe darcy <[email protected]> wrote:

> Hello,
>
> On 2/8/2018 3:53 AM, Alan Bateman wrote:
>
>> On 07/02/2018 22:12, joe darcy wrote:
>>
>>> Hello,
>>>
>>> Text in java.lang.Character states a UTF-16 character encoding is used
>>> for java.lang.String. While was true for many years, it is not necessarily
>>> true and not true in practice as of JDK 9 due to the improvements from JEP
>>> 254: Compact Strings.
>>>
>>> The statement about the encoding should be corrected.
>>>
>>> Please review the patch below which does this. (I've formatted the patch
>>> so that the change is text is made clear; I'll re-flow the paragraph before
>>> pushing.
>>>
>> I'm not sure that this is worth changing. You could replace "classes"
>> with "API" and add a note to say that an implementation may use an more
>> optimization representation but I don't think it's really needed.
>>
>>
> In response to this feedback and others, how about:
>
>      [...] The Java
>   * platform uses the UTF-16 representation in {@code char} arrays and
> - * in the {@code String} and {@code StringBuffer} classes. In
> + * presents a UTF-16 model in the string-related API.
>
> IMO anyway, I think saying "uses a UTF-16 representation for String" is at
> best misleading with the current implementation since 8 != 16 for the
> compressed representation is used for all Latin-1 strings.
>
> Thanks,
>
> -Joe
>

Re: JDK 11 RFR of JDK-8196995: java.lang.Character should not state UTF-16 encoding is used for strings

Reply via email to