On Thu, 15 Jan 2026 20:00:43 GMT, Liam Miller-Cushon <[email protected]> wrote:

>> While is convenient that those UTF16 charsets have a easy to compute size, I 
>> doubt those two are in sufficient use to justify a commitment support them 
>> in the fast path.
>> If you are going to support charsets beyond the most common utf8, ascii, and 
>> ISO-8856-1, then
>> computing the encoded length should delegated to the Charset itself and have 
>> separate code in different packages.
>> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be 
>> useful for single byte formats, but if `maxBytesPerChar` is equal to 
>> `averageBytesPerChar` that might be a useful shortcut.
>
>> While is convenient that those UTF16 charsets have a easy to compute size, I 
>> doubt those two are in sufficient use to justify a commitment support them 
>> in the fast path. If you are going to support charsets beyond the most 
>> common utf8, ascii, and ISO-8856-1, then computing the encoded length should 
>> delegated to the Charset itself and have separate code in different packages.
> 
> Thanks, that makes sense to me. My opinion is that a large amount of the 
> value here is in optimizing UTF-8, and that there's an argument to optimize 
> the other standard charsets that `String` has other fast paths for, but 
> sharply diminishing returns beyond that. I would be inclined to stop at the 
> standard charsets, but also happy to make changes if there's a preference for 
> having more or fewer fast paths.
> 
>> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be 
>> useful for single byte formats, but if `maxBytesPerChar` is equal to 
>> `averageBytesPerChar` that might be a useful shortcut.
> 
> I had a quick look at that, and saw errors for `IBM-Thai`:
> 
> 
>         CharsetEncoder encoder = cs.newEncoder();
>         if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() == 
> encoder.averageBytesPerChar()) {
>             return value.length * (int) encoder.maxBytesPerChar();
>         }

Its good to start with only the most common Charsets, and see if the API is 
adopted and anyone comments on a performance problem with other Charsets.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695835410

Reply via email to