On Fri, 30 Jan 2026 15:56:20 GMT, Liam Miller-Cushon <[email protected]> wrote:
>> This implements an API to return the byte length of a String encoded in a >> given charset. See >> [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background. >> >> --- >> >> >> Benchmark (encoding) (stringLength) Mode >> Cnt Score Error Units >> StringLoopJmhBenchmark.getBytes ASCII 10 thrpt >> 5 406782650.595 ± 16960032.852 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100 thrpt >> 5 172936926.189 ± 4532029.201 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 1000 thrpt >> 5 38830681.232 ± 2413274.766 ops/s >> StringLoopJmhBenchmark.getBytes ASCII 100000 thrpt >> 5 458881.155 ± 12818.317 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 10 thrpt >> 5 37193762.990 ± 3962947.391 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100 thrpt >> 5 55400876.236 ± 1267331.434 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 1000 thrpt >> 5 11104514.001 ± 41718.545 ops/s >> StringLoopJmhBenchmark.getBytes LATIN1 100000 thrpt >> 5 182535.414 ± 10296.120 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 10 thrpt >> 5 113474681.457 ± 8326589.199 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100 thrpt >> 5 37854103.127 ± 4808526.773 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 1000 thrpt >> 5 4139833.009 ± 70636.784 ops/s >> StringLoopJmhBenchmark.getBytes UTF16 100000 thrpt >> 5 57644.637 ± 1887.112 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 10 thrpt >> 5 946701647.247 ± 76938927.141 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100 thrpt >> 5 396615374.479 ± 15167234.884 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 1000 thrpt >> 5 100464784.979 ± 794027.897 ops/s >> StringLoopJmhBenchmark.getBytesLength ASCII 100000 thrpt >> 5 1215487.689 ± 1916.468 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 10 thrpt >> 5 221265102.323 ± 17013983.056 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 100 thrpt >> 5 137617873.887 ± 5842185.781 ops/s >> StringLoopJmhBenchmark.getBytesLength LATIN1 1000 thrpt >> 5 92540259.1... > > Liam Miller-Cushon has updated the pull request incrementally with one > additional commit since the last revision: > > Rename getBytesLength to getByteLength > For completeness, here's a demo of it in `CharsetEncoder` (#29639). As > expected it's possible to implement it that way and preserve equivalent > performance, by adding a package visibility method to `String` and using > `JavaLangAccess`. With that change, `string.getByteLength(UTF_8)` could be > expressed as: > > ```java > try { > int byteLength = StandardCharsets.UTF_8.newEncoder() > .onUnmappableCharacter(CodingErrorAction.REPLACE) > .onMalformedInput(CodingErrorAction.REPLACE) > .getByteLength(stringData); > } catch (CharacterCodingException e) { > throw new IllegalStateException(e); > } > ``` > > I can update the CSR to document this as an alternative. This looks verbose at first sight. But I like how it allows control over coding error actions. This enables input validation and computing length in a single pass. Your demo seems to optimize only for `CodingErrorAction.REPLACE`, but that's probably more of an implementation detail than a limiting factor of API design, right? The demo focuses on the encoding side, but for completeness I guess the decoding side (with validation) could look like: > ```java > try { > int stringLength = StandardCharsets.UTF_8.newDecoder() > .onUnmappableCharacter(CodingErrorAction.REPORT) > .onMalformedInput(CodingErrorAction.REPORT) > .getDecodedLength(stringData); > } catch (CharacterCodingException e) { > throw new IllegalStateException(e); > } > ``` Did the stateful `CharsetEncoder` created meaningfully affect your performance benchmarking? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3877259235
