On Thu, 4 Sep 2025 14:40:36 GMT, Chen Liang <li...@openjdk.org> wrote:

>> Guanqiang Han has updated the pull request with a new target base due to a 
>> merge or a rebase. The incremental webrev excludes the unrelated changes 
>> brought in by the merge/rebase. The pull request contains 16 additional 
>> commits since the last revision:
>> 
>>  - move common method into a common file.
>>  - Merge remote-tracking branch 'upstream/master' into 8328874
>>  - Update Class.java
>>    
>>    change overflow check
>>  - Update Class.java
>>    
>>    Simplify length check
>>  - Update Class.java
>>    
>>    avoid the case of int overflow
>>  - Update Class.java
>>    
>>    Use ModifiedUtf.utfLen instead of static import for readability
>>  - change copyright year
>>  - a small fix
>>  - add regression test
>>  - Merge remote-tracking branch 'upstream/master' into 8328874
>>  - ... and 6 more: https://git.openjdk.org/jdk/compare/7a4c9817...edc1694d
>
> src/java.base/share/classes/jdk/internal/util/ModifiedUtf.java line 37:
> 
>> 35: public abstract class ModifiedUtf {
>> 36:     //Max length in Modified UTF-8 bytes for class names.(see 
>> max_symbol_length in symbol.hpp)
>> 37:     public static final int JAVA_CLASSNAME_MAX_LEN = 65535;
> 
> max_symbol_length is not just class names - it is presumably the limit for 
> modified UTF-8, as seen in `java.io.DataOutput::writeUTF`. We can just use a 
> more generic name like `MAX_ENCODED_LENGTH`.

There is no maximum length of an encoded UTF-8 string. The "modified UTF-8" is 
modified because it encodes a zero byte using the 2-byte version so the result 
never contains a null.  Allowing in some use cases to terminated the encoded 
UTF-8 bytes using a nul byte.
In the DataOutput case, it was desirable to provide the length of the encoded 
bytes to make it easy to read or skip the encoded UTF-8. It improved some 
stream decoding but increased the cost of writing because the encoded length 
was needed before writing. It also prevented an exact size allocation before 
decoding.  In retrospect, it could have provided both the encoded and decoded 
lengths, saving some allocations.
In ObjectOutputStream, the stream protocol had both long and short forms 
because Strings can be much longer.
The method names and constants are specific to the encoding of **Class** names 
and that should be reflected in their names.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/26802#discussion_r2323069496

Reply via email to