On Wed, 30 Jul 2025 14:18:49 GMT, Chen Liang <li...@openjdk.org> wrote:
>> In #24773, people were concerned that the layout of a UTF16 byte array and a >> char array may be incompatible. In fact, they are - they are asserted in a >> corner in `LibraryCallKit::inline_string_char_access` in `library_call.cpp`. >> >> In addition, another frequent error I see is that contributors have confused >> the meaning of indices in StringUTF16 - the indices are always in char array >> indices. I think we should make these explicit to help future maintenance. > > Chen Liang has updated the pull request incrementally with one additional > commit since the last revision: > > Add paragraph for endianness and layout > All indices and sizes for byte arrays carrying UTF-16 data are in number of > `char`s instead of number of bytes. @liach, given the relatively big API surface of `j.l.StringUTF16`, are we certain about this? src/java.base/share/classes/java/lang/StringUTF16.java line 51: > 49: /// > 50: /// All indices and sizes for byte arrays carrying UTF16 data are in > number of > 51: /// chars instead of number of bytes. Nit on cosmetics: Suggestion: /// UTF-16 `String` operations. /// /// UTF-16 byte arrays have the identical layout as `char` arrays. They share the /// same base offset and scale, and for each two-byte unit interpreted as a `char`, /// it has the same endianness as a `char`, which is the platform endianness. /// This is ensured in the static initializer of [StringUTF16]. /// /// All indices and sizes for byte arrays carrying UTF-16 data are in number of /// `char`s instead of number of bytes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26541#issuecomment-3137375297 PR Review Comment: https://git.openjdk.org/jdk/pull/26541#discussion_r2243513617