Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Liam Miller-Cushon Mon, 09 Feb 2026 13:36:31 -0800

On Fri, 6 Feb 2026 16:34:38 GMT, Roger Riggs <[email protected]> wrote:


> The encoded form is always bytes, so I don't think 'byte' needs to be in the 
> name. I'd be fine with getEncodedLength(Charset).

The javadoc would specify that it's a length in bytes, so perhaps that's 
sufficient without including 'bytes' in the method name.

I do think that some callers might expect `getEncodedLength(UTF_16)` to return 
a length in code units and not bytes. There was some related discussion in 
[JDK-8372338](https://bugs.openjdk.org/browse/JDK-8372338) and also Maurizio's 
[Pulling the (foreign) 
string](https://cr.openjdk.org/~mcimadamore/panama/strings_ffm.html#reading-strings-with-known-length)
 doc.

> The discoverability of the method if placed as 
> Charset.getEncodedLength(String) would be very low and would require 
> cross-package hacking to gain the performance advantage.

For completeness, here's a demo of it in `CharsetEncoder` 
(https://github.com/openjdk/jdk/pull/29639). As expected it's possible to 
implement it that way and preserve equivalent performance, by adding a package 
visibility method to `String` and using `JavaLangAccess`. With that change, 
`string.getByteLength(UTF_8)` could be expressed as:


    try {
        int byteLength = StandardCharsets.UTF_8.newEncoder()
                .onUnmappableCharacter(CodingErrorAction.REPLACE)
                .onMalformedInput(CodingErrorAction.REPLACE)
                .getByteLength(stringData);
    } catch (CharacterCodingException e) {
        throw new IllegalStateException(e);
    }


I can update the CSR to document this as an alternative.

> Should we also consider the inverse operation, that is to compute the length 
> of a String had it been decoded from a sequence of bytes? Someone will 
> eventually ask for this. I see some potential use case for it in the ZipFile 
> implementation where knowing the length ahead of decoding could provide 
> efficient rejection of strings without decoding and without looking at String 
> contents.

What is the use-case for `decodedLength` in `ZipFile`? Does 'efficient 
rejection of strings without decoding' require knowing the decoded length, or 
just whether the data is a valid encoding?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28454#issuecomment-3872766017

Re: RFR: 8372353: API to compute the byte length of a String encoded in a given Charset [v17]

Reply via email to