On Mon, 4 Aug 2025 23:57:28 GMT, Naoto Sato <na...@openjdk.org> wrote:
>> Enabling lenient minus sign matching when parsing numbers. In some locales, >> e.g. Finnish, the default minus sign is the Unicode "Minus Sign" (U+2212), >> which is not the "Hyphen Minus" (U+002D) that users type in from keyboard. >> Thus the parsing of user input numbers may fail. This change utilizes CLDR's >> `parseLenient` element for minus signs and loosely matches them with the >> hyphen-minus so that user input numbers can parse. As this is a behavioral >> change, a corresponding CSR has been drafted. > > Naoto Sato has updated the pull request incrementally with one additional > commit since the last revision: > > refrects review comments Latest changes look good to me. I think ignoring supplementary/normalization is fine and it would have been excessive otherwise. I left some more minor comments. src/java.base/share/classes/java/text/CompactNumberFormat.java line 219: > 217: * </pre></blockquote> > 218: * > 219: * @implNote The implementation follows the LDML specification to enable > loose Suggestion: * @implNote The JDK Reference Implementation follows the LDML specification to enable loose src/java.base/share/classes/java/text/DecimalFormat.java line 421: > 419: * returns a numerically greater value. > 420: * > 421: * @implNote The implementation follows the LDML specification to enable > loose Suggestion: * @implNote The JDK Reference Implementation follows the LDML specification to enable loose src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 719: > 717: > 718: /** > 719: * {@return the lenient minus signs} Multiple lenient minus signs Do we have an idea of when a given locale would not have access to the lenient minus signs provided by parseLenient element? If the vast majority of times it will, then it is fine. Otherwise IMO, `getLenientMinusSigns()` should probably call out the fact that it may not always be a concatenation of multiple minus _signs_, and it may be a single minus _sign_ (as assigned by `minusSignText`). Since that detail is only apparent by digging around the code that assigns `lenientMinusSigns`. src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 720: > 718: /** > 719: * {@return the lenient minus signs} Multiple lenient minus signs > 720: * are concatenated to form the returned string. The surrounding package private methods use since tags. Suggestion: * are concatenated to form the returned string. * @since 26 src/java.base/share/classes/java/text/DecimalFormatSymbols.java line 852: > 850: > 851: // Lenient minus signs > 852: lenientMinusSigns = numberElements.length < 14 ? minusSignText : > numberElements[13]; BTW, if I remove this check and always assign to `numberElements[13]`, I do not observe any failures in the java_text/Format suite. It would be nice to have an idea of why this check is needed. (I understand it is following the same length checks of monetarySeparator and monetaryGroupingSeparator.) ------------- PR Review: https://git.openjdk.org/jdk/pull/26580#pullrequestreview-3086137609 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2252827372 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254807654 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254847469 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254792967 PR Review Comment: https://git.openjdk.org/jdk/pull/26580#discussion_r2254846860