Hi Lothar,

The issue you are seeing here is the one we are aware of for a long time: https://bugs.openjdk.org/browse/JDK-8194289

The difference between JDK8 and JDK11 comes from the difference between JDK's legacy locale data and the CLDR locale data provided by the Unicode Consortium. The CLDR locale data became the default locale data in JDK9 (https://openjdk.org/jeps/252)

The locale data changes from time to time, from small ones such as translation changes to the somewhat significant ones, so the best course of action is not relying on any specific locale data. However, I see your situation is to parse dates from outside the JDK, so I understand you will need to adapt to whatever the date format the app receives.

As the immediate temporary solution, you could choose the legacy locale data over CLDR, by using the system property `java.locale.providers=COMPAT` mentioned in those JBS issues. For a longer term, you might want to implement `java.text.spi.DateFormatProvider` for your specific needs (in this case, parse abbreviated months without the trailing dot in German).

Please note that the system property value `COMPAT` was a temporary measure for migration, so it is deprecated as of JDK21 and will eventually be removed in the future JDK.

Also, we are in the process of revising the old release notes wrt the CLDR compatibility issues so that users will know them beforehand.

HTH,
Naoto

On 10/10/23 3:47 AM, Lothar Kimmeringer wrote:
Hi,

my first mail in this list, so please be gentle ;-)

I've encountered issues when trying to keep date parsing functionality when
migrating from Java 8 to Java 11. This happened a while ago and I implemented local workarounds but with installations using more recent versions of Java 11 things broke again so I'm not sure if I'm simply doing things wrong or if there
are actual bugs.

I've attached my JUnit-class that contains the different issues (not as single tests but I will highlight them here in this mail. "Here" SimpleDateFormat is
used but I've added a test to use DateTimeFormatter to make sure that it's
not the use of old classes and that the problem persists in the new API as
well.

Most issues come up when trying to parse abbreviated months with Locales
different from "en". Our use case is that data with the same date layout
but different Locales are parsed (e.g. Ebay revenue summary CSV-files or FTP servers on german Windows installations). The dates used there are of the form

  - Ebay: 18. Mär 2023
  - FTP server: Mär 14  2022

This worked well with MMM in the template till Java 8 then LLL got introduced and MMM now leads to the use of four letters being used for the abbreviation
including a dot. Btw: I think the Javadoc that explains the template-parts
(e.g. in SimpleDateFormat) should have an additional column containing an example
for a non-EN-Locale, because

M    Month in year (context sensitive)  Month  July; Jul; 07
L    Month in year (standalone form)    Month  July; Jul; 07

isn't helping at all to see the effect of these two template-parts, so e.g.

M    Month ... (context...)             Month   January;...   Januar; Jan.; 01 M    Month ... (standalone...)          Month   January;...   Januar; Jan; 01

might be better for understanding it.

With the use of LLL all tests with dates without a dot can now be parsed
again using the same mask. But it's not possible to parse a date where
the month is always abbreviated with a dot in a consistent way, e.g.

23. Dez. 2016 11:12:13.456
using the template
dd. LLL. yyyy HH:mm:ss.SSS

It works with Locale en (with "Dec" as month of course) but not with "de".

Reason is that SimpleDateFormat is using all month display names when
parsing "month standalone". That also includes the abbreviated month including
dots. Because these months are in general longer than their standalone
counterparts (except three-letter months like "Mai" in german)
matchString considers this as best match, "consuming" the dot in the text to
be parsed which is now missing when the parsing continues.

DateTimeFormatter seem to work differently because it's not failing at that
point (haven't debugged it) but is failing when trying to parse russian dates
without abbreviating dots. I assume that is because the ru-Locale doesn't
seem to have values for the standalone month. I could live with that given
our user base but the parser in java.time runs into problems when parsing
a time with milliseconds: You need to provide as many "S" as there are digits
in the value:

  - "23. Dec. 2016 11:12:13.456" needs "dd. LLL. yyyy HH:mm:ss.SSS",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.S"
  - "23. Dec. 2016 11:12:13.4"   needs "dd. LLL. yyyy HH:mm:ss.S",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.SSS"

When handling data from different sources where one source is cutting
away trailing zeros and the other isn't you essentially need to parse
the date to be parsed to use the correct template being used for parsing.

SimpleDateFormat parses the date correctly independent from the
number of "S" in the template and the actual number of digits
in the text to be parsed.

While my lengthy explanation of the problems with LLL might result into
the answer "not a bug, go away" ;-) I definitely see the milliseconds with
java.time.* as one.


Thanks for reading this far and best regards,

Lothar Kimmeringer

Reply via email to