On Wed, 14 Feb 2024 11:29:43 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> src/java.base/share/classes/java/io/DataInputStream.java line 604:
>> 
>>> 602:                 // For ASCII ISO-8859-1 is equivalent to UTF-8, while 
>>> avoiding a redundant
>>> 603:                 // scan
>>> 604:                 return new String(bytearr, 0, utflen, 
>>> StandardCharsets.ISO_8859_1);
>> 
>> Not sure this is correct.
>> If `bytearr` contains some `(byte)0`, that is, if `in` is malformed, this 
>> doesn't throw `UTFDataFormatException`, but it should: modified UTF-8 cannot 
>> contain zeros.
>
> While properly encoded modified UTF-8 strings won't have embedded zeros 
> (`\u0000` will be encoded as `0xC0, 0x80`) the decoding routines in 
> `DataInputStream` and `ObjectInputStream` allows them and does not throw an 
> exception if an embedded zero is encountered. This PR does not change 
> semantics here AFAICT. If you think we need to be stricter in these decoders 
> that could be done as a separate RFE and I'll put this on hold.

Ah OK.

I didn't check the current code, only the proposed one.
Although the specification clearly states that the method should throw, if the 
current code does not throw on zeros, then it makes sense that the proposed one 
shouldn't either.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/17734#discussion_r1489331002

Reply via email to