On Mon, 13 Oct 2025 21:51:32 GMT, Justin Lu <[email protected]> wrote:

>> This PR corrects _test/jdk/java/util/Locale/LocaleEnhanceTest.java_, which 
>> has two test cases under `testBuilderSetLanguageTag()` which accidentally 
>> pass. One checks that Locale.setLanguageTag(String) throws ILE for duplicate 
>> extensions and the other for duplicate U-extension keys. The test cases are 
>> updated to actually test the provided code. When the test cases are fixed, 
>> they now fail.
>> 
>> Fixing the behavior to match the expectation of those test cases is 
>> consistent with the specification.
>> 
>> From `Locale.forLanguageTag(String)`,
>> 
>>> 
>>>      * <p>If the specified language tag contains any ill-formed subtags,
>>>      * the first such subtag and all following subtags are ignored.  Compare
>>>      * to {@link Locale.Builder#setLanguageTag(String)} which throws an 
>>> exception
>>>      * in this case.
>> 
>> and the RFC specification
>> 
>>> Each singleton subtag MUST appear at most one time in each tag
>>>        (other than as a private use subtag).  That is, singleton subtags
>>>        MUST NOT be repeated.  For example, the tag "en-a-bbb-a-ccc" is
>>>        invalid because the subtag 'a' appears twice.
>> 
>> Since duplicate extensions (and Unicode keys/attributes) are invalid, 
>> throwing `IllformedLocaleException` in (the strict) `Locale.Builder` and 
>> ignoring in (the lenient) `Locale.forLanguageTag` for such tags would be 
>> appropriate. This PR updates the implementation as such.
>
> Justin Lu has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adding test case to confirm duplicate U-extension attributes for 
> setExtension(char, String)

IIUC, the quote from the RFC refers to duplicate singletons. For example, it 
would reject something like `-u-aa-bbb-u-cc-ddd`. So I believe that rule 
doesn’t apply to cases like `-u-aa-bbb-AA-ccc`. I checked the `-u` extension 
definition in LDML but couldn’t find any description regarding duplicate 
keywords.

That said, I think it makes sense to allow them in lenient mode and throw an 
exception in strict mode. Since this would introduce a behavioral change, I’d 
expect it to require a CSR.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27775#issuecomment-3399242375

Reply via email to