On Thu, 3 Oct 2024 08:52:01 GMT, Jeremie Miserez <d...@openjdk.org> wrote:
>> Mapping ISO-8859-8-I charset to ISO-8859-8. >> Below mentioned 2 aliases are added as part of this:- >> **ISO-8859-8-I** >> **ISO8859-8-I** >> >> The bug report for the same:- https://bugs.openjdk.org/browse/JDK-8195686 > > One more thing: I forgot to explain why the alias ISO-8859-8-i -> ISO-8859-8 > would definitely be correct. > > Java strings are stored in logical order. That is true for both LTR and RTL > languages. This is plainly apparent from the OpenJDK String source code, but > also explicitly mentioned/explained e.g. by official tutorials such as here: > https://docs.oracle.com/javase/tutorial/2d/text/textlayoutbidirectionaltext.html#ordering_text > > ISO-8859-8-i texts are always sent in logical order (by definition). **So > decoding a ISO-8859-8-i text into a Java string using the ISO-8859-8 alias > will result in the correct order of characters in the Java string, i.e. > logical order, and thus is always 100% correct by definition.** > > Technically speaking, and for completeness sake here is the full list of > cases for regular ISO-8859-8 today: > > 1. ISO-8859-8 texts may contain either LTR language content, in which case > the text is correctly decoded to a Java string in logical order. -> OK > 2. ISO-8859-8 texts may also contain RTL language content in logical order > (newer applications already do this), in which case the text is also > correctly decoded to a Java string in logical order. -> OK. > 3. But: If a ISO-8859-8 text contains RTL language content in visual order > (old applications, historically the case), the text would be decoded to a > Java string in visual order. This is actually technically incorrect and may > be a source of bugs (e.g. concatenation won't work correctly). However this > behavior cannot be changed in OpenJDK anymore as (old) applications may rely > on it. > > So: Case 2 is what would happen if the alias was added. Now as long as nobody > adds a "auto-reverse visual to logical order" heuristic for RTL ISO-8859-8 > text decoding in OpenJDK (which I'm fairly certain can't / mustn't be done), > using a simple alias ISO-8859-8-i -> ISO-8859-8 will thus always be correct. > The alias will result in case 2, i.e. texts will always be decoded into the > correct Java string in logical order. @jmiserez wrote: > But: If a ISO-8859-8 text contains RTL language content in visual order (old > applications, historically the case), the text would be decoded to a Java > string in visual order. This is actually technically incorrect and may be a > source of bugs (e.g. concatenation won't work correctly). However this > behavior cannot be changed in OpenJDK anymore as (old) applications may rely > on it. In other words, Java _may_ have been incorrectly handling `ISO-8859-8` all this time if content was in visual order. Putting in this alias means that ISO-8859-8-I will be handled correctly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20690#issuecomment-2403364716