On Wed, 11 Jun 2025 17:24:11 GMT, Per Minborg <pminb...@openjdk.org> wrote:
>> This PR proposes to improve the 'MemorySegment.getString(long offset, >> Charset charset)` method documentation with respect to multi-octet concerns. > > Per Minborg has updated the pull request incrementally with one additional > commit since the last revision: > > Improve wording src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1307: > 1305: * return new String(bytes, charset); > 1306: * } > 1307: * @implNote If the segment size is not evenly dividable by the > number of octets used I think the relevant concepts here are: * a valid charset has a fixed encoding where each character is turned into N bytes * the terminator is also N bytes long * the number of bytes read is given by the result of the integer division `S / N`, where `S` is the size of the segment (because if we have a remainder R < N, then we know it can't be a valid terminator) * I'm not sure what you mean by that last sentence. Maybe that if you have `N = 4`, and you have `AA00`, `00BB`, those four zeros are not considered a terminator? I think speaking of alignment here is misleading, because we're not really suggesting that a terminator in a `N = 4` charset should start at an address that is 4-byte aligned -- we're just saying that the _offset_ at which that terminator starts (relative to the start of the segment) is a multiple of 4. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25715#discussion_r2141145457