Re: [10] RFR 8176841: Additional Unicode Language-Tag Extensions

Naoto Sato Mon, 20 Nov 2017 17:46:57 -0800

Hi Stephen, please see my comments below.

On 11/20/17 5:27 AM, Stephen Colebourne wrote:

I've had a longer think about how to integrate this. Its very tricky,
as the unicode extensions create complex conflicts.


In general, my view is that Locale is too complex and overloaded with
different levels of meaning. Perhaps a different class should have
been added for more complex localization settings. It seems like an
are rife for puzzlers/security issues as unicode extensions are going
to be hugely undertested. (Most developers think of a Locale as
language+territory, and will only test for that, if they test at all).
It will be all too easy for text to appear on web pages or reports in
an unexpected calendar system or time-zone as a result of this change.
It also means that numbers might be output in an unexpected numbering
system or currency.

The tension is between what the developer has thought of and tested
for, and what a user might pass in. The mechanism for passing in these
extensions is partly subtle, being via the default locale. Although
even via an explicit locale, I suspect very few developers will be
sanitizing a Locale input from a user.

AFAICT, unicode extensions are not widely used at present. I can only
find use of "ca" in Chronology.ofLocale() and Calendar.getInstance().
So this is the key moment for deciding how they are used.

We are discussing 6 unicode extensions: "ca" (calendar system), "tz"
(time-zone), "cu" (currency), "nu" (numbering system), "fw" (first day
of week) and "rg" (region override).

Chronology, ZoneId, TimeZone and Currency ("ca", "tz", "cu") are not
really aspects of formatting at all, they are explicit aspects of the
associated value. eg. a ZonedDateTime explicitly has a chronology of
IsoChronology and a specific time zone. A Money class (Java will have
one at some point, probably after value types are added) explicitly
has a Currency.

You *really, really, really* don't want a unicode extension locale to
be changing the format of a Money object - turning $300 to £300 just
because the user had -u-cu-gbp in their locale. The same applies to
the formatting of dates and times - the chronology and time-zone are
part of the value being formatted, not the format.

This just leaves "nu" (numbering system) where there is a case for the
unicode extension to be picked up directly. But doing so would cause a
LocalDate to be output as ????-??-?? (where ? is the digit for another
numbering system), While I think that should be supported, should it
be picked up automatically via the locale? No, it should be explicitly
selected by using DateTimeFormatter.withDecimalStyle() - which the
webrev does do.

The distinction here is that DateTimeFormatter (and our theoretical
MoneyFormatter) are formatting the whole state of the value object,
whereas DateFormatter and NumberFormatter essentially just format a
single number, using extra information alongside to help.

So, what to do? Some of these are new changes, some just need testing:

1) The changes to `Calendar.getInstance()` are not documented wrt
"tz". Currently it says it results in the "default time zone", which
isn't true anymore. I think this may be a behavioural compatibility
issue.


Explicitly document that it will use the time zone in locale.


2) `DecimalStyle.ofLocale(Locale)` should use "nu" but does not.


Document it in the javadoc.


3) `DateTimeFormatter.localizedBy(Locale)` should use "ca" to call
`withChronology`, `tz` to call `withZoneId` and `nu` to call
`withDecimalStyle`. This is a change to the CSR.

Besides that "nu" needs to be spec'ed out, isn't calling withXXXX() animplementation note?


4) The phrase "Unicode extensions in the locale are ignored." would
apply to lots of methods across the JDK. In particular for this
webrev, it would need to be added to a lot more methods, such as the
default locale methods. Perhaps the phrase could be "If the locale has
unicode extensions, they are not used", which is slightly easier to
read. Or perhaps it shouldn't be added at all in most cases (ie.
unless unicode extensions are mentioned, assume that they are not
used.)


Removed the phrase.


5) The withLocale(Locale) change should say "The locale is stored as
passed in, without further processing. If the locale has unicode
extensions, they may be used later in text processing. To set the
chronology, time-zone and decimal style from unicode extensions, see
localizedBy()."


Added.


6) The "rg" extension (and no other extensions) should be used when
looking up data to output these:
- DateTimeFormatterBuilder.appendText()
- DateTimeFormatterBuilder.appendLocalizedOffset()
- DateTimeFormatterBuilder.appendZoneText()
- DateTimeFormatterBuilder.appendChronologyText()
- DateTimeFormatterBuilder.appendLocalized()
- DateTimeFormatterBuilder.getLocalizedDateTimePattern()
This may be the case, but tests should ensure it.


Will need to add tests.


7) WeekBasedFieldPrinterParser should use "fw"/"rg", which it already
does via WeekFields.of(Locale)


Not sure what this means. Where is the file located?


8) For clarity of future code, two additional methods would be useful:
- DateTimeFormatterBuilder.toFormatterIsoRoot()
- DateTimeFormatterBuilder.toFormatterIso(Locale)
These would set the chronology of the resulting DTF to be
IsoChronology.INSTANCE. The "root" variant would use `Locale.ROOT`.


This would be a nice addition.


9) Consider more generally what happens if TimeZone.getDefault() does
not match the "tz" extension of Locale.getDefault().

This should be considered case by case, and document on each occasionwhich would be honored.


10) Consider how localizedBy(Locale) operates. Is it the same as
formatter
   .withChronology(Chronology.ofLocale(loc))
   .withZoneId(ZoneId.ofLocale(loc))
   .withDecimalStyle(DecimalStyle.of(loc))
or does it only set the chronology if "ca" if found, and only set
time-zone if "tz" is found and only set decimal style if "nu" is
found.

IIRC, the localizedBy() is added so that withLocale() would behave as itis now. I think localizedBy() should also have the same effect aswithLocale if the specified locale do not contain anycalendar/timezone/numberingSystem extensions. Otherwise, saylocalizedBy(Locale.JAPAN) would be no-operation.


As we discussed in the previous email, we don't have ZoneId.ofLocale() yet.


Perhaps this webrev should be broken into two or more parts, as it is
very large?


In summary, going forward it should be recognised that there is a
separation/difference between formatters of full values
(Money/Temporal) and old-style formatters of single numbers
(DateFormat, NumberFormat). Some extensions, "ca", "tz", "cu", are
part of the value in Money/Temporal, and so can't be overridden by the
locale. Methods like Chronology.ofLocale or ZoneId.of(Locale) are good
places to use the extensions, as they are explicit about what happens.

Here is the spec diff from the previous one. Please let me know yourthought.


http://cr.openjdk.java.net/~naoto/8191349/specdiff.03/overview-summary.html

Naoto


Stephen


On 15 November 2017 at 23:31, Naoto Sato <[email protected]> wrote:

Thanks, Lance. Corrected as suggested.

http://cr.openjdk.java.net/~naoto/8191349/specdiff.02/overview-summary.html
Also, I inserted "@since 10."

Naoto

On 11/15/17 3:06 PM, Lance Andersen wrote:


Hi Naoto

localizedBy, i would suggest changing:

-   “If the new locale contains “ca”…”  to “if the new locale contains the
“ca”…”
- “Unlike withLocale method” to Unlike the withLocale method”

Best
Lance


On Nov 15, 2017, at 5:36 PM, Naoto Sato <[email protected]
<mailto:[email protected]>> wrote:


http://cr.openjdk.java.net/~naoto/8191349/specdiff.01/overview-summary.html



<http://oracle.com/us/design/oracle-email-sig-198324.gif>

<http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
<http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
[email protected] <mailto:[email protected]>

Re: [10] RFR 8176841: Additional Unicode Language-Tag Extensions

Reply via email to