On Thu, 4 Sep 2025 18:19:40 GMT, Roger Riggs <rri...@openjdk.org> wrote:

>> src/java.base/share/classes/jdk/internal/util/ModifiedUtf.java line 37:
>> 
>>> 35: public abstract class ModifiedUtf {
>>> 36:     //Max length in Modified UTF-8 bytes for class names.(see 
>>> max_symbol_length in symbol.hpp)
>>> 37:     public static final int JAVA_CLASSNAME_MAX_LEN = 65535;
>> 
>> max_symbol_length is not just class names - it is presumably the limit for 
>> modified UTF-8, as seen in `java.io.DataOutput::writeUTF`. We can just use a 
>> more generic name like `MAX_ENCODED_LENGTH`.
>
> There is no maximum length of an encoded UTF-8 string. The "modified UTF-8" 
> is modified because it encodes a zero byte using the 2-byte version so the 
> result never contains a null.  Allowing in some use cases to terminated the 
> encoded UTF-8 bytes using a nul byte.
> In the DataOutput case, it was desirable to provide the length of the encoded 
> bytes to make it easy to read or skip the encoded UTF-8. It improved some 
> stream decoding but increased the cost of writing because the encoded length 
> was needed before writing. It also prevented an exact size allocation before 
> decoding.  In retrospect, it could have provided both the encoded and decoded 
> lengths, saving some allocations.
> In ObjectOutputStream, the stream protocol had both long and short forms 
> because Strings can be much longer.
> The method names and constants are specific to the encoding of **Class** 
> names and that should be reflected in their names.

These are specific to the encoding of all UTF-8 Class File constant too, 
instead of being Class specific.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26802#discussion_r2323100636

Reply via email to