Re: RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v14]

ExE Boss Fri, 28 Nov 2025 14:03:06 -0800

On Tue, 25 Nov 2025 19:10:25 GMT, Liam Miller-Cushon <[email protected]> wrote:


>> This PR proposes adding a new overload to `MemorySegment::getString` that 
>> takes a known byte length of the content.
>> 
>> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, 
>> but the outcome of 
>> [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) was to update 
>> `MemorySegment#getString` to suggest
>> 
>> 
>>     byte[] bytes = new byte[length];
>>     MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length);
>>     return new String(bytes, charset);
>> 
>> 
>> However this is less efficient than what the implementation of getString 
>> does after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it 
>> now uses `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy.
>> 
>> See also discussion in [this panama-dev@ 
>> thread](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021193.html),
>>  and mcimadamore's document [Pulling the (foreign) 
>> string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html)
>> 
>> Benchmark results:
>> 
>> 
>> Benchmark                                 (size)  Mode  Cnt    Score   Error 
>>  Units
>> ToJavaStringTest.jni_readString                5  avgt   30   55.339 ± 0.401 
>>  ns/op
>> ToJavaStringTest.jni_readString               20  avgt   30   59.887 ± 0.295 
>>  ns/op
>> ToJavaStringTest.jni_readString              100  avgt   30   84.288 ± 0.419 
>>  ns/op
>> ToJavaStringTest.jni_readString              200  avgt   30  119.275 ± 0.496 
>>  ns/op
>> ToJavaStringTest.jni_readString              451  avgt   30  193.106 ± 1.528 
>>  ns/op
>> ToJavaStringTest.panama_copyLength             5  avgt   30    7.348 ± 0.048 
>>  ns/op
>> ToJavaStringTest.panama_copyLength            20  avgt   30    7.440 ± 0.125 
>>  ns/op
>> ToJavaStringTest.panama_copyLength           100  avgt   30   11.766 ± 0.058 
>>  ns/op
>> ToJavaStringTest.panama_copyLength           200  avgt   30   16.096 ± 0.089 
>>  ns/op
>> ToJavaStringTest.panama_copyLength           451  avgt   30   25.844 ± 0.054 
>>  ns/op
>> ToJavaStringTest.panama_readString             5  avgt   30    5.857 ± 0.046 
>>  ns/op
>> ToJavaStringTest.panama_readString            20  avgt   30    7.750 ± 0.046 
>>  ns/op
>> ToJavaStringTest.panama_readString           100  avgt   30   14.109 ± 0.187 
>>  ns/op
>> ToJavaStringTest.panama_readString           200  avgt   30   18.035 ± 0.130 
>>  ns/op
>> ToJavaStringTest.panama_readString           451  avgt   30   35.896 ± 0.227 
>>  ns/op
>> ToJavaStringTest.panama_readStringLength       5  avgt   30    4.565 ± 0.038 
>>  ns/op
>> ToJavaStringTest.panama_readStringLength      20...
>
> Liam Miller-Cushon has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Review feedback

## Re **CSR**:

diff -U 3 a/JDK‑8372338.md b/JDK‑8372338.md
--- a/JDK‑8372338.md
+++ b/JDK‑8372338.md
@@ -15,7 +15,7 @@
 Solution
 --------

-This change adds threw new methods to support efficient handling of non-null 
terminated strings:
+This change adds three new methods to support efficient handling of non-null 
terminated strings:

 * `MemorySegment#getString(long offset, Charset charset, long length)`
 * `MemorySegment#copy(String src, Charset dstEncoding, int srcIndex, 
MemorySegment dst, long dstOffset, int numChars)`
@@ -28,11 +28,11 @@
 2. number of code units
 3. the number of characters in the resulting string

-(3) was was rejected because for variable length encodings it requires a 
decoding step to convert to bytes for a bulk copy operation. This leaves (1) 
and (2) as candidates -- since the conversion between the two is a trivial 
scaling factor, either would have been a viable choice. Code units might be 
more natural for native strings encoded as an array of code units. Using a byte 
length was [decided 
on](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021215.html) to 
allow supporting arbitrary charsets, since not all charsets may have a concept 
of a code unit.
+(3) was rejected because for variable length encodings it requires a decoding 
step to convert to bytes for a bulk copy operation. This leaves (1) and (2) as 
candidates -- since the conversion between the two is a trivial scaling factor, 
either would have been a viable choice. Code units might be more natural for 
native strings encoded as an array of code units. Using a byte length was 
[decided 
on](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021215.html) to 
allow supporting arbitrary charsets, since not all charsets may have a concept 
of a code unit.

 For `copy` and `allocateFrom`, the `srcIndex` and `numChars` are expressed in 
terms of character offsets into the string. This is the only practical choice 
here, since the client already has a Java string, and computing an offset in 
bytes or code units would require additional computation.

-The new `copy` method is the dual of the new `getString`, and allows writing 
strings to a target memory segment without a terminator. There was a potential 
analogy to the existing `MemorySegment#setString` methods here, but they write 
strings with null terminators. This operation is more in common with the other 
`copy` overloads, where here a String is the source of data Strings (as opposed 
to e.g. an array).
+The new `copy` method is the dual of the new `getString`, and allows writing 
strings to a target memory segment without a terminator. There was a potential 
analogy to the existing `MemorySegment#setString` methods here, but they write 
strings with null terminators. This operation is more in common with the other 
`copy` overloads, where here a String is the source of data (as opposed to e.g. 
an array).

 Specification
 -------------

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28043#issuecomment-3590599626

Re: RFR: 8369564: Provide a MemorySegment API to read strings with known lengths [v14]

Reply via email to