Re: RFR: 8345292: Improve javadocs for MemorySegment::getStrings defining word boundary cases [v4]

Maurizio Cimadamore Wed, 11 Jun 2025 14:29:53 -0700

On Wed, 11 Jun 2025 17:24:11 GMT, Per Minborg <pminb...@openjdk.org> wrote:


>> This PR proposes to improve the 'MemorySegment.getString(long offset, 
>> Charset charset)` method documentation with respect to multi-octet concerns.
>
> Per Minborg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Improve wording

src/java.base/share/classes/java/lang/foreign/MemorySegment.java line 1307:

> 1305:      *     return new String(bytes, charset);
> 1306:      * }
> 1307:      * @implNote If the segment size is not evenly dividable by the 
> number of octets used

I think the relevant concepts here are:
* a valid charset has a fixed encoding where each character is turned into N 
bytes
* the terminator is also N bytes long
* the number of bytes read is given by the result of the integer division `S / 
N`, where `S` is the size of the segment (because if we have a remainder R < N, 
then we know it can't be a valid terminator)
* I'm not sure what you mean by that last sentence. Maybe that if you have `N = 
4`, and you have `AA00`, `00BB`, those four zeros are not considered a 
terminator? I think speaking of alignment here is misleading, because we're not 
really suggesting that a terminator in a `N = 4` charset should start at an 
address that is 4-byte aligned -- we're just saying that the _offset_ at which 
that terminator starts (relative to the start of the segment) is a multiple of 
4.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25715#discussion_r2141145457

Re: RFR: 8345292: Improve javadocs for MemorySegment::getStrings defining word boundary cases [v4]

Reply via email to