On Tue, 30 Jan 2024 18:28:32 GMT, Archie Cobbs <[email protected]> wrote:
>> Please consider this fix to ensure that going from `MessageFormat` to
>> pattern string via `toPattern()` and then back via `new MessageFormat()`
>> results in a format that is equivalent to the original.
>>
>> The quoting and escaping rules for `MessageFormat` pattern strings are
>> really tricky. I admit not completely understanding them. At a high level,
>> they work like this: The normal way one would "nest" strings containing
>> special characters is with straightforward recursive escaping like with the
>> `bash` command line. For example, if you want to echo `a "quoted string"
>> example` then you enter `echo "a "quoted string" example"`. With this scheme
>> it's always the "outer" layer's job to (un)escape special characters as
>> needed. That is, the echo command never sees the backslash characters.
>>
>> In contrast, with `MessageFormat` and friends, nested subformat pattern
>> strings are always provided "pre-escaped". So to build an "outer" string
>> (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or
>> less just concatenated, and then only the `ChoiceFormat` option separator
>> characters (e.g., `<`, `#`, `|`, etc.) are escaped.
>>
>> The "pre-escape" escaping algorithm escapes `{` characters, because `{`
>> indicates the beginning of a format argument. However, it doesn't escape `}`
>> characters. This is OK because the format string parser treats any "extra"
>> closing braces (where "extra" means not matching an opening brace) as plain
>> characters.
>>
>> So far, so good... at least, until a format string containing an extra
>> closing brace is nested inside a larger format string, where the extra
>> closing brace, which was previously "extra", can now suddenly match an
>> opening brace in the outer pattern containing it, thus truncating it by
>> "stealing" the match from some subsequent closing brace.
>>
>> An example is the `MessageFormat` string `"{0,choice,0.0#option A:
>> {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a
>> trailing closing brace in plain text. If you create a `MessageFormat` with
>> this string, you see a trailing `}` only with the second option.
>>
>> However, if you then invoke `toPattern()`, the result is
>> `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the
>> "extra" closing brace is no longer quoted, it matches the opening brace at
>> the beginning of the string, and the following closing brace, which was the
>> previous match, is now just plain text in the outer `MessageFormat` string.
>>
>> As a result, invoking `f.format(new ...
>
> Archie Cobbs has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Fix misspelling in comment.
Left some minor comments, but LGTM, I also ran this change against internal
tests and all is good there. Others may have suggestions, in any case, you will
still need a review from someone with 'reviewer' status.
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 26:
> 24: /*
> 25: * @test
> 26: * @summary Verify MessageFormat.toPattern() is equivalent to original
> pattern
Could make the summary a little more specific for future readers
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line 70:
> 68: @BeforeAll
> 69: public static void setup() {
> 70: savedLocale = Locale.getDefault();
I'm not sure we need to save the default locale and restore it, unless I'm
missing something.
test/jdk/java/text/Format/MessageFormat/MessageFormatToPatternTest.java line
104:
> 102: Arguments.of("{0,choice,0.0#option A: {0}|1.0#option B:
> {0}'}'}", "option B: 1.23}"),
> 103: Arguments.of("{0,choice,0.0#option A: {0}|2.0#option B:
> {0}'}'}", "option A: 1.23"),
> 104:
Suggestion:
// Absurd double quote examples
Arguments.of("Foo '}''''''''}' {0,number,bar'}' '}' } baz ", "Foo }''''} bar} }
1 baz "),
Arguments.of("'''}''{'''}''''}'"), "'}'{'}''}"),
test/jdk/java/text/Format/MessageFormat/MessageFormatsByArgumentIndex.java line
35:
> 33: import java.text.NumberFormat;
> 34:
> 35: public class MessageFormatsByArgumentIndex {
Can you bump the latter copyright year for this file
test/jdk/java/text/Format/MessageFormat/MessageRegression.java line 114:
> 112: * MessageFormat.toPattern has weird rounding behavior.
> 113: */
> 114: @Test
This file as well
-------------
Marked as reviewed by jlu (Committer).
PR Review: https://git.openjdk.org/jdk/pull/17416#pullrequestreview-1857862268
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475280475
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475279603
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475294193
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475277825
PR Review Comment: https://git.openjdk.org/jdk/pull/17416#discussion_r1475278147