Re: RFR: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF [v5]

Raffaello Giulietti Thu, 15 Feb 2024 03:03:54 -0800

On Thu, 15 Feb 2024 10:55:38 GMT, Raffaello Giulietti <rgiulie...@openjdk.org> 
wrote:


>> The specification is somewhat ambiguous:
>> https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/io/DataInput.html#readUTF()
>> 
>> There's a sweeping `Throws UTFDataFormatException - if the bytes do not 
>> represent a valid modified UTF-8 encoding of a string` but also: `If the 
>> first byte of a group matches the bit pattern 0xxxxxxx (where x means "may 
>> be 0 or 1"), then the group consists of just that byte. The byte is 
>> zero-extended to form a character.` I think the latter gives some leeway on 
>> being lenient on embedded zeros, even if it's made clear elsewhere that 
>> valid encoders need to replace zeros with the `0xC0, 0x80` sequence.
>
> In fact, the implementations of `readUTF*()` in `DataInputStream` and 
> `ObjectInputStream` are much more lenient than that. They also accept ASCII 
> characters that are encoded with 2 bytes instead of 1. There's no check that 
> the encoding is "minimal length".

This is according to `DataInput` specification.
So what `UTFDataFormatException` means is kind of ambiguous.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17734#discussion_r1490826777

Re: RFR: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF [v5]

Reply via email to