On Tue, 15 Apr 2025 23:20:45 GMT, Sergey Bylokhov <s...@openjdk.org> wrote:

> can we also force this rule by the jcheck?

Well, yes and no. First, we can verify that we do not have invalid UTF-8. That 
might be a signal that the encoding is wrong. But then this check needs to be 
able to distinguish between pure binary files that happen to look like 
improperly encoded UTF-8 files, and actually incorrectly encoded text files. In 
the end, this is likely to be more of an heuristic for a warning, rather than 
something we can block integration on.

Secondly, files can have incorrect encodings but still pass as valid UTF-8. 
Only a human can tell that the content would be incorrect if we were to assume 
the encoding is UTF-8 instead of e.g. latin-1. This cannot be checked by 
jcheck, but must be caught by reviewers.

I have beeb thinking, though, to add a warning to jcheck for adding non-ASCII 
characters to known text file types. As a general rule, this is acceptable but 
should only be done judiciously, so it would be good to have jcheck point it 
out. That would also give you an extra chance to verify the encoding.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24574#issuecomment-2809028487

Reply via email to