On Tue, 25 Nov 2025 19:10:25 GMT, Liam Miller-Cushon <[email protected]> wrote:
>> This PR proposes adding a new overload to `MemorySegment::getString` that >> takes a known byte length of the content. >> >> This was previously proposed in https://github.com/openjdk/jdk/pull/20725, >> but the outcome of >> [JDK-8333843](https://bugs.openjdk.org/browse/JDK-8333843) was to update >> `MemorySegment#getString` to suggest >> >> >> byte[] bytes = new byte[length]; >> MemorySegment.copy(segment, JAVA_BYTE, offset, bytes, 0, length); >> return new String(bytes, charset); >> >> >> However this is less efficient than what the implementation of getString >> does after [JDK-8362893](https://bugs.openjdk.org/browse/JDK-8362893), it >> now uses `JavaLangAccess::uncheckedNewStringNoRepl` to avoid the copy. >> >> See also discussion in [this panama-dev@ >> thread](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021193.html), >> and mcimadamore's document [Pulling the (foreign) >> string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html) >> >> Benchmark results: >> >> >> Benchmark (size) Mode Cnt Score Error >> Units >> ToJavaStringTest.jni_readString 5 avgt 30 55.339 ± 0.401 >> ns/op >> ToJavaStringTest.jni_readString 20 avgt 30 59.887 ± 0.295 >> ns/op >> ToJavaStringTest.jni_readString 100 avgt 30 84.288 ± 0.419 >> ns/op >> ToJavaStringTest.jni_readString 200 avgt 30 119.275 ± 0.496 >> ns/op >> ToJavaStringTest.jni_readString 451 avgt 30 193.106 ± 1.528 >> ns/op >> ToJavaStringTest.panama_copyLength 5 avgt 30 7.348 ± 0.048 >> ns/op >> ToJavaStringTest.panama_copyLength 20 avgt 30 7.440 ± 0.125 >> ns/op >> ToJavaStringTest.panama_copyLength 100 avgt 30 11.766 ± 0.058 >> ns/op >> ToJavaStringTest.panama_copyLength 200 avgt 30 16.096 ± 0.089 >> ns/op >> ToJavaStringTest.panama_copyLength 451 avgt 30 25.844 ± 0.054 >> ns/op >> ToJavaStringTest.panama_readString 5 avgt 30 5.857 ± 0.046 >> ns/op >> ToJavaStringTest.panama_readString 20 avgt 30 7.750 ± 0.046 >> ns/op >> ToJavaStringTest.panama_readString 100 avgt 30 14.109 ± 0.187 >> ns/op >> ToJavaStringTest.panama_readString 200 avgt 30 18.035 ± 0.130 >> ns/op >> ToJavaStringTest.panama_readString 451 avgt 30 35.896 ± 0.227 >> ns/op >> ToJavaStringTest.panama_readStringLength 5 avgt 30 4.565 ± 0.038 >> ns/op >> ToJavaStringTest.panama_readStringLength 20... > > Liam Miller-Cushon has updated the pull request incrementally with one > additional commit since the last revision: > > Review feedback ## Re **CSR**: diff -U 3 a/JDK‑8372338.md b/JDK‑8372338.md --- a/JDK‑8372338.md +++ b/JDK‑8372338.md @@ -15,7 +15,7 @@ Solution -------- -This change adds threw new methods to support efficient handling of non-null terminated strings: +This change adds three new methods to support efficient handling of non-null terminated strings: * `MemorySegment#getString(long offset, Charset charset, long length)` * `MemorySegment#copy(String src, Charset dstEncoding, int srcIndex, MemorySegment dst, long dstOffset, int numChars)` @@ -28,11 +28,11 @@ 2. number of code units 3. the number of characters in the resulting string -(3) was was rejected because for variable length encodings it requires a decoding step to convert to bytes for a bulk copy operation. This leaves (1) and (2) as candidates -- since the conversion between the two is a trivial scaling factor, either would have been a viable choice. Code units might be more natural for native strings encoded as an array of code units. Using a byte length was [decided on](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021215.html) to allow supporting arbitrary charsets, since not all charsets may have a concept of a code unit. +(3) was rejected because for variable length encodings it requires a decoding step to convert to bytes for a bulk copy operation. This leaves (1) and (2) as candidates -- since the conversion between the two is a trivial scaling factor, either would have been a viable choice. Code units might be more natural for native strings encoded as an array of code units. Using a byte length was [decided on](https://mail.openjdk.org/pipermail/panama-dev/2025-November/021215.html) to allow supporting arbitrary charsets, since not all charsets may have a concept of a code unit. For `copy` and `allocateFrom`, the `srcIndex` and `numChars` are expressed in terms of character offsets into the string. This is the only practical choice here, since the client already has a Java string, and computing an offset in bytes or code units would require additional computation. -The new `copy` method is the dual of the new `getString`, and allows writing strings to a target memory segment without a terminator. There was a potential analogy to the existing `MemorySegment#setString` methods here, but they write strings with null terminators. This operation is more in common with the other `copy` overloads, where here a String is the source of data Strings (as opposed to e.g. an array). +The new `copy` method is the dual of the new `getString`, and allows writing strings to a target memory segment without a terminator. There was a potential analogy to the existing `MemorySegment#setString` methods here, but they write strings with null terminators. This operation is more in common with the other `copy` overloads, where here a String is the source of data (as opposed to e.g. an array). Specification ------------- ------------- PR Comment: https://git.openjdk.org/jdk/pull/28043#issuecomment-3590599626
