On Thu, 15 Jan 2026 20:00:43 GMT, Liam Miller-Cushon <[email protected]> wrote:
>> While is convenient that those UTF16 charsets have a easy to compute size, I
>> doubt those two are in sufficient use to justify a commitment support them
>> in the fast path.
>> If you are going to support charsets beyond the most common utf8, ascii, and
>> ISO-8856-1, then
>> computing the encoded length should delegated to the Charset itself and have
>> separate code in different packages.
>> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be
>> useful for single byte formats, but if `maxBytesPerChar` is equal to
>> `averageBytesPerChar` that might be a useful shortcut.
>
>> While is convenient that those UTF16 charsets have a easy to compute size, I
>> doubt those two are in sufficient use to justify a commitment support them
>> in the fast path. If you are going to support charsets beyond the most
>> common utf8, ascii, and ISO-8856-1, then computing the encoded length should
>> delegated to the Charset itself and have separate code in different packages.
>
> Thanks, that makes sense to me. My opinion is that a large amount of the
> value here is in optimizing UTF-8, and that there's an argument to optimize
> the other standard charsets that `String` has other fast paths for, but
> sharply diminishing returns beyond that. I would be inclined to stop at the
> standard charsets, but also happy to make changes if there's a preference for
> having more or fewer fast paths.
>
>> Have you looked at `CharsetEncoder.maxBytesPerChar()`, It might only be
>> useful for single byte formats, but if `maxBytesPerChar` is equal to
>> `averageBytesPerChar` that might be a useful shortcut.
>
> I had a quick look at that, and saw errors for `IBM-Thai`:
>
>
> CharsetEncoder encoder = cs.newEncoder();
> if (encoder.maxBytesPerChar() == 1f && encoder.maxBytesPerChar() ==
> encoder.averageBytesPerChar()) {
> return value.length * (int) encoder.maxBytesPerChar();
> }
Its good to start with only the most common Charsets, and see if the API is
adopted and anyone comments on a performance problem with other Charsets.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28454#discussion_r2695835410