On Thu, 24 Jul 2025 14:20:48 GMT, Chen Liang <li...@openjdk.org> wrote:

>> src/java.base/share/classes/java/lang/StringUTF16.java line 1490:
>> 
>>> 1488:                 val,
>>> 1489:                 Unsafe.ARRAY_BYTE_BASE_OFFSET + ((long) index << 1),
>>> 1490:                 (long) (end - off) << 1);
>> 
>> The documentation of `copyMemory()` is not super-clear about endianness.
>> But it seems to imply that in this case it behaves as if it were to copy 
>> `short`s, so endianness seems to be preserved.
>> 
>> The invocation of `copyMemory()` here implicitly assumes that 
>> `ARRAY_CHAR_INDEX_SCALE` and `ARRAY_BYTE_INDEX_SCALE` are 2 and 1, resp., 
>> which seems quite reasonable but not written in the stone.
>
> I recall runtime requires UTF16 byte array and char array have exactly the 
> same layout - would be nice if we keep this in the design notes for the 
> string implementation classes, such as on the class header.
> 
> (Useful notes could include that indices are char-based, UTF16 byte[] and 
> char[] has identical layout, etc.)

The StringUTF16.getChar and putChar methods are carefully written to use the 
platform endianness to compose and decompose char values from and to byte[] in 
terms of shifts of the lower and upper bytes.
The mapping of that into other apis that try to optimize between char[] and the 
compact string byte[] are less well documented.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24773#discussion_r2228721098

Reply via email to