Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Eirik Bjørsnøs Fri, 06 Feb 2026 05:51:43 -0800

On Fri, 30 Jan 2026 15:56:20 GMT, Liam Miller-Cushon <[email protected]> wrote:


>> This implements an API to return the byte length of a String encoded in a 
>> given charset. See 
>> [JDK-8372353](https://bugs.openjdk.org/browse/JDK-8372353) for background.
>> 
>> ---
>> 
>> 
>> Benchmark                              (encoding)  (stringLength)   Mode  
>> Cnt          Score          Error  Units
>> StringLoopJmhBenchmark.getBytes             ASCII              10  thrpt    
>> 5  406782650.595 ± 16960032.852  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII             100  thrpt    
>> 5  172936926.189 ±  4532029.201  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII            1000  thrpt    
>> 5   38830681.232 ±  2413274.766  ops/s
>> StringLoopJmhBenchmark.getBytes             ASCII          100000  thrpt    
>> 5     458881.155 ±    12818.317  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1              10  thrpt    
>> 5   37193762.990 ±  3962947.391  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1             100  thrpt    
>> 5   55400876.236 ±  1267331.434  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1            1000  thrpt    
>> 5   11104514.001 ±    41718.545  ops/s
>> StringLoopJmhBenchmark.getBytes            LATIN1          100000  thrpt    
>> 5     182535.414 ±    10296.120  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16              10  thrpt    
>> 5  113474681.457 ±  8326589.199  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16             100  thrpt    
>> 5   37854103.127 ±  4808526.773  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16            1000  thrpt    
>> 5    4139833.009 ±    70636.784  ops/s
>> StringLoopJmhBenchmark.getBytes             UTF16          100000  thrpt    
>> 5      57644.637 ±     1887.112  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII              10  thrpt    
>> 5  946701647.247 ± 76938927.141  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII             100  thrpt    
>> 5  396615374.479 ± 15167234.884  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII            1000  thrpt    
>> 5  100464784.979 ±   794027.897  ops/s
>> StringLoopJmhBenchmark.getBytesLength       ASCII          100000  thrpt    
>> 5    1215487.689 ±     1916.468  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1              10  thrpt    
>> 5  221265102.323 ± 17013983.056  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1             100  thrpt    
>> 5  137617873.887 ±  5842185.781  ops/s
>> StringLoopJmhBenchmark.getBytesLength      LATIN1            1000  thrpt    
>> 5   92540259.1...
>
> Liam Miller-Cushon has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Rename getBytesLength to getByteLength

Excuse my last minute bikeshedding, but naming in `java.util.String` is 
important and everlasting, so here we go:

I understand the reasoning behind `getBytesLength` / `getByteLength` aligning 
with the preexisting `getBytes` method. However,  when seen as an independent 
API name, it seems quite weak.

What is a "byte length" anyway? 8 bits? Any outsider/newcomer would need to dig 
into the API docs to figure out what this method does.  

Could we at least consider something with a stronger semantic expessiveness?

May we also consider that this method could lean on `length()` instead of 
`getBytes()` ?

What this returns is *the length in bytes of this String encoded with the given 
Charset*.

Was `getEncodedLength` considered?

I find it a bit revealing that the private implementation methods are named 
`encodedLength***`. 

So perhaps `encodedLength` could work?

I expect this method to have few "regular" users. Maybe not confusing the 99% 
of String users that will never need this method is more important than 
improving discoverability for the 1%?

If I'm right that this method will have few "regular" users, perhaps 
specialists could be better served finding it in `java.util.Charset`?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3860582916

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Reply via email to