This PR corrects Locale parsing logic for extra languages. The BCP syntax 
enforces that extlangs may only follows `2*3 ALPHA` langs. This is also 
reinforced by the syntax comment described in `LanguageTag.parse` (which is 
based off the BNF). However, the current implementation does not respect this, 
and allows extlangs to follow `4ALPHA` (future use) as well as `5*8ALPHA` langs.

For example, `Locale.forLanguageTag("quux-bar").toLanguageTag()` returns the 
extlang "bar" when it should return the lang "quux" and discard the extlang 
"bar".

This is likely an oversight and should be fixed rather than kept and specified 
as a BCP deviation, since it is non standard for extlangs to follow those 
previously mentioned longer tags.

I can file a release note if deemed warranted since the acceptable inputs 
shrink as a result (even if the correct behavior). Personally, I would lean 
towards not filing one since such occurrences would be non-standard as there 
are no extlangs that follow a non 2-3 length language prefix.

---------
- [x] I confirm that I make this contribution in accordance with the [OpenJDK 
Interim AI Policy](https://openjdk.org/legal/ai).

-------------

Commit messages:
 - init

Changes: https://git.openjdk.org/jdk/pull/31663/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=31663&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8387253
  Stats: 21 lines in 2 files changed: 17 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/31663.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/31663/head:pull/31663

PR: https://git.openjdk.org/jdk/pull/31663

Reply via email to