Re: [DISCUSS] Modello release

Vladimir Sitnikov Tue, 15 Feb 2022 04:38:50 -0800

>correctly even if the encoding changes, because that will lead to the file
>being overwritten.


Once again: decoding does **not** guarantee you get **invalid** string when
decoding fails.
The replacement string might look like a regular string, and it might even
collide with the input string.

Here's an example:
1. Suppose someone writes out.txt file a Cyrillic letter ф in UTF-8
encoding.
jshell> "ф".getBytes("UTF-8")
$1 ==> byte[2] { -47, -124 }

2. Suppose there's an encoding UTF-Z that replaces unmappable bytes with ?
(ASCII "?")
Suppose UTF-Z treats all "negative bytes" as undecodable.

3. Suppose someone attempts to write "??" ( two ASCII question marks) using
UTF-Z encoding to the file out.txt
The expected file contents should be like 63, 63.
"CachingWriter" would try to decode file contents using UTF-Z, and it would
end up with "??" (two ASCII question marks because the file contains two
negative bytes).

4. The result of decoding would artificially match to the desired string
contents (remember, that at step 3 we wanted to store "??"),
so the caching writer would **skip** file overwrite even though the file
must be updated.

Note: the example does not work for UTF-8 because UTF-8 uses a special
"illegal" character for unmappable chars.
However, there's no guarantee every encoding would use illegal chars for
undecodable cases.

Vladimir

Re: [DISCUSS] Modello release

Reply via email to