Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v21]

Alan Bateman Mon, 02 Mar 2026 23:37:02 -0800

On Tue, 3 Mar 2026 00:41:42 GMT, Liam Miller-Cushon <[email protected]> wrote:


>> This implements an API to return the byte length of a String encoded in a 
>> given charset. See 
>> [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background.
>> 
>> ---
>> 
>> 
>> Benchmark                              (encoding)  (stringLength)   Mode  
>> Cnt          Score          Error  Units
>> StringLoopJmhBenchmark.getBytes             ASCII              10  thrpt    
>> 5  406782650.595 ± 16960032.852  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII             100  thrpt    
>> 5  172936926.189 ±  4532029.201  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII            1000  thrpt    
>> 5   38830681.232 ±  2413274.766  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII          100000  thrpt    
>> 5     458881.155 ±    12818.317  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1              10  thrpt    
>> 5   37193762.990 ±  3962947.391  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1             100  thrpt    
>> 5   55400876.236 ±  1267331.434  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1            1000  thrpt    
>> 5   11104514.001 ±    41718.545  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1          100000  thrpt    
>> 5     182535.414 ±    10296.120  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16              10  thrpt    
>> 5  113474681.457 ±  8326589.199  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16             100  thrpt    
>> 5   37854103.127 ±  4808526.773  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16            1000  thrpt    
>> 5    4139833.009 ±    70636.784  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16          100000  thrpt    
>> 5      57644.637 ±     1887.112  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII              10  thrpt    
>> 5  946701647.247 ± 76938927.141  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII             100  thrpt    
>> 5  396615374.479 ± 15167234.884  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII            1000  thrpt    
>> 5  100464784.979 ±   794027.897  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII          100000  thrpt    
>> 5    1215487.689 ±     1916.468  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1              10  thrpt    
>> 5  221265102.323 ± 17013983.056  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1             100  thrpt    
>> 5  137617873.887 ±  5842185.781  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1            1000  thrpt    
>> 5   92540259.1...
>
> Liam Miller-Cushon has updated the pull request with a new target base due to 
> a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 25 additional 
> commits since the last revision:
> 
>  - Merge remote-tracking branch 'origin/master' into strlen
>  - Update copyright year, and add bug number
>  - Merge remote-tracking branch 'origin/master' into strlen
>  - Rename to encodedLength
>  - Rename to getEncodedLength
>  - Merge remote-tracking branch 'origin/master' into strlen
>  - Rename getBytesLength to getByteLength
>  - Update javadoc to refer to 'this {@code String}', not 'the given String'
>  - Clarify that "It" in the javadoc means "This method"
>  - Remove paragraph break
>  - ... and 15 more: https://git.openjdk.org/jdk/compare/9687b8a3...d0301f0b

src/java.base/share/classes/java/lang/String.java line 2114:

> 2112:      *
> 2113:      * @apiNote This method provides equivalent or better performance 
> than {@link #getBytes(Charset)
> 2114:      *          getBytes(cs).length}. This method may allocate memory 
> to compute the length for some charsets.

I think it would be better to drop "This method may allocate memory to compute 
the length for some charsets" from the apiNote. Just looks very out of place in 
String docs that never speak of memory usage.

src/java.base/share/classes/java/lang/String.java line 2119:

> 2117:      * @since 27
> 2118:      */
> 2119:     public int encodedLength(Charset cs) {

I see the method name has changed from early iterations. Thanks Eirik Bjørsnøs 
for asking good questions on this point.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2876638573
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2876643192

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v21]

Reply via email to