Instead of saying something specific about char[], String etc you can be more general and say something about the entire Java Platform.
"The Java Platform generally processes human language text represented as sequences of {@code char} values in the UTF-16 encoding of Unicode" But doing this well would involve a greater overhaul of the class doc. Character will always be confusing because it is ambiguous about whether referring to a Unicode character or a UTF-16 code unit. On Thu, Feb 8, 2018 at 10:59 AM, joe darcy <joe.da...@oracle.com> wrote: > Hello, > > On 2/8/2018 3:53 AM, Alan Bateman wrote: > >> On 07/02/2018 22:12, joe darcy wrote: >> >>> Hello, >>> >>> Text in java.lang.Character states a UTF-16 character encoding is used >>> for java.lang.String. While was true for many years, it is not necessarily >>> true and not true in practice as of JDK 9 due to the improvements from JEP >>> 254: Compact Strings. >>> >>> The statement about the encoding should be corrected. >>> >>> Please review the patch below which does this. (I've formatted the patch >>> so that the change is text is made clear; I'll re-flow the paragraph before >>> pushing. >>> >> I'm not sure that this is worth changing. You could replace "classes" >> with "API" and add a note to say that an implementation may use an more >> optimization representation but I don't think it's really needed. >> >> > In response to this feedback and others, how about: > > [...] The Java > * platform uses the UTF-16 representation in {@code char} arrays and > - * in the {@code String} and {@code StringBuffer} classes. In > + * presents a UTF-16 model in the string-related API. > > IMO anyway, I think saying "uses a UTF-16 representation for String" is at > best misleading with the current implementation since 8 != 16 for the > compressed representation is used for all Latin-1 strings. > > Thanks, > > -Joe >