On Tue, 30 Jan 2024 18:28:32 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> Please consider this fix to ensure that going from `MessageFormat` to >> pattern string via `toPattern()` and then back via `new MessageFormat()` >> results in a format that is equivalent to the original. >> >> The quoting and escaping rules for `MessageFormat` pattern strings are >> really tricky. I admit not completely understanding them. At a high level, >> they work like this: The normal way one would "nest" strings containing >> special characters is with straightforward recursive escaping like with the >> `bash` command line. For example, if you want to echo `a "quoted string" >> example` then you enter `echo "a "quoted string" example"`. With this scheme >> it's always the "outer" layer's job to (un)escape special characters as >> needed. That is, the echo command never sees the backslash characters. >> >> In contrast, with `MessageFormat` and friends, nested subformat pattern >> strings are always provided "pre-escaped". So to build an "outer" string >> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or >> less just concatenated, and then only the `ChoiceFormat` option separator >> characters (e.g., `<`, `#`, `|`, etc.) are escaped. >> >> The "pre-escape" escaping algorithm escapes `{` characters, because `{` >> indicates the beginning of a format argument. However, it doesn't escape `}` >> characters. This is OK because the format string parser treats any "extra" >> closing braces (where "extra" means not matching an opening brace) as plain >> characters. >> >> So far, so good... at least, until a format string containing an extra >> closing brace is nested inside a larger format string, where the extra >> closing brace, which was previously "extra", can now suddenly match an >> opening brace in the outer pattern containing it, thus truncating it by >> "stealing" the match from some subsequent closing brace. >> >> An example is the `MessageFormat` string `"{0,choice,0.0#option A: >> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a >> trailing closing brace in plain text. If you create a `MessageFormat` with >> this string, you see a trailing `}` only with the second option. >> >> However, if you then invoke `toPattern()`, the result is >> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the >> "extra" closing brace is no longer quoted, it matches the opening brace at >> the beginning of the string, and the following closing brace, which was the >> previous match, is now just plain text in the outer `MessageFormat` string. >> >> As a result, invoking `f.format(new ... > > Archie Cobbs has updated the pull request incrementally with one additional > commit since the last revision: > > Fix misspelling in comment. Left some minor comments, but LGTM, I also ran this change against internal tests and all is good there. Others may have suggestions, in any case, you will still need a review from someone with 'reviewer' status. test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 26: > 24: /* > 25: * @test > 26: * @summary Verify MessageFormat.toPattern() is equivalent to original > pattern Could make the summary a little more specific for future readers test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 70: > 68: @BeforeAll > 69: public static void setup() { > 70: savedLocale = Locale.getDefault(); I'm not sure we need to save the default locale and restore it, unless I'm missing something. test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 104: > 102: Arguments.of("{0,choice,0.0#option A: {0}|1.0#option B: > {0}'}'}", "option B: 1.23}"), > 103: Arguments.of("{0,choice,0.0#option A: {0}|2.0#option B: > {0}'}'}", "option A: 1.23"), > 104: Suggestion: // Absurd double quote examples Arguments.of("Foo '}''''''''}' {0,number,bar'}' '}' } baz ", "Foo }''''} bar} } 1 baz "), Arguments.of("'''}''{'''}''''}'"), "'}'{'}''}"), test/jdk/java/text/Format/MessageFormat/MessageFormatsByArgumentIndex.java line 35: > 33: import java.text.NumberFormat; > 34: > 35: public class MessageFormatsByArgumentIndex { Can you bump the latter copyright year for this file test/jdk/java/text/Format/MessageFormat/MessageRegression.java line 114: > 112: * MessageFormat.toPattern has weird rounding behavior. > 113: */ > 114: @Test This file as well ------------- Marked as reviewed by jlu (Committer). PR Review: https://git.openjdk.org/jdk/pull/17416#pullrequestreview-1857862268 PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475280475 PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475279603 PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475294193 PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475277825 PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475278147