On Thu, 4 Sep 2025 18:19:40 GMT, Roger Riggs <rri...@openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/util/ModifiedUtf.java line 37: >> >>> 35: public abstract class ModifiedUtf { >>> 36: //Max length in Modified UTF-8 bytes for class names.(see >>> max_symbol_length in symbol.hpp) >>> 37: public static final int JAVA_CLASSNAME_MAX_LEN = 65535; >> >> max_symbol_length is not just class names - it is presumably the limit for >> modified UTF-8, as seen in `java.io.DataOutput::writeUTF`. We can just use a >> more generic name like `MAX_ENCODED_LENGTH`. > > There is no maximum length of an encoded UTF-8 string. The "modified UTF-8" > is modified because it encodes a zero byte using the 2-byte version so the > result never contains a null. Allowing in some use cases to terminated the > encoded UTF-8 bytes using a nul byte. > In the DataOutput case, it was desirable to provide the length of the encoded > bytes to make it easy to read or skip the encoded UTF-8. It improved some > stream decoding but increased the cost of writing because the encoded length > was needed before writing. It also prevented an exact size allocation before > decoding. In retrospect, it could have provided both the encoded and decoded > lengths, saving some allocations. > In ObjectOutputStream, the stream protocol had both long and short forms > because Strings can be much longer. > The method names and constants are specific to the encoding of **Class** > names and that should be reflected in their names. These are specific to the encoding of all UTF-8 Class File constant too, instead of being Class specific. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26802#discussion_r2323100636