Re: Date parsing issues with SimpleDateFormat and DateTimeFormatter

Naoto Sato Tue, 10 Oct 2023 10:29:38 -0700

Hi Lothar,

The issue you are seeing here is the one we are aware of for a longtime: https://bugs.openjdk.org/browse/JDK-8194289

The difference between JDK8 and JDK11 comes from the difference betweenJDK's legacy locale data and the CLDR locale data provided by theUnicode Consortium. The CLDR locale data became the default locale datain JDK9 (https://openjdk.org/jeps/252)

The locale data changes from time to time, from small ones such astranslation changes to the somewhat significant ones, so the best courseof action is not relying on any specific locale data. However, I seeyour situation is to parse dates from outside the JDK, so I understandyou will need to adapt to whatever the date format the app receives.

As the immediate temporary solution, you could choose the legacy localedata over CLDR, by using the system property`java.locale.providers=COMPAT` mentioned in those JBS issues. For alonger term, you might want to implement`java.text.spi.DateFormatProvider` for your specific needs (in thiscase, parse abbreviated months without the trailing dot in German).

Please note that the system property value `COMPAT` was a temporarymeasure for migration, so it is deprecated as of JDK21 and willeventually be removed in the future JDK.

Also, we are in the process of revising the old release notes wrt theCLDR compatibility issues so that users will know them beforehand.


HTH,
Naoto

On 10/10/23 3:47 AM, Lothar Kimmeringer wrote:

Hi,

my first mail in this list, so please be gentle ;-)

I've encountered issues when trying to keep date parsing functionality when
migrating from Java 8 to Java 11. This happened a while ago and Iimplementedlocal workarounds but with installations using more recent versions ofJava 11things broke again so I'm not sure if I'm simply doing things wrong orif there
are actual bugs.
I've attached my JUnit-class that contains the different issues (not assingletests but I will highlight them here in this mail. "Here"SimpleDateFormat is
used but I've added a test to use DateTimeFormatter to make sure that it's
not the use of old classes and that the problem persists in the new API as
well.

Most issues come up when trying to parse abbreviated months with Locales
different from "en". Our use case is that data with the same date layout
but different Locales are parsed (e.g. Ebay revenue summary CSV-files orFTPservers on german Windows installations). The dates used there are ofthe form
  - Ebay: 18. Mär 2023
  - FTP server: Mär 14  2022
This worked well with MMM in the template till Java 8 then LLL gotintroducedand MMM now leads to the use of four letters being used for theabbreviation
including a dot. Btw: I think the Javadoc that explains the template-parts
(e.g. in SimpleDateFormat) should have an additional column containingan example
for a non-EN-Locale, because

M    Month in year (context sensitive)  Month  July; Jul; 07
L    Month in year (standalone form)    Month  July; Jul; 07

isn't helping at all to see the effect of these two template-parts, so e.g.
M Month ... (context...) Month January;... Januar;Jan.; 01M Month ... (standalone...) Month January;... Januar;Jan; 01
might be better for understanding it.

With the use of LLL all tests with dates without a dot can now be parsed
again using the same mask. But it's not possible to parse a date where
the month is always abbreviated with a dot in a consistent way, e.g.

23. Dez. 2016 11:12:13.456
using the template
dd. LLL. yyyy HH:mm:ss.SSS

It works with Locale en (with "Dec" as month of course) but not with "de".

Reason is that SimpleDateFormat is using all month display names when
parsing "month standalone". That also includes the abbreviated monthincluding
dots. Because these months are in general longer than their standalone
counterparts (except three-letter months like "Mai" in german)
matchString considers this as best match, "consuming" the dot in thetext to
be parsed which is now missing when the parsing continues.

DateTimeFormatter seem to work differently because it's not failing at that
point (haven't debugged it) but is failing when trying to parse russiandates
without abbreviating dots. I assume that is because the ru-Locale doesn't
seem to have values for the standalone month. I could live with that given
our user base but the parser in java.time runs into problems when parsing
a time with milliseconds: You need to provide as many "S" as there aredigits
in the value:

  - "23. Dec. 2016 11:12:13.456" needs "dd. LLL. yyyy HH:mm:ss.SSS",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.S"
  - "23. Dec. 2016 11:12:13.4"   needs "dd. LLL. yyyy HH:mm:ss.S",
    it doesn't work with "dd. LLL. yyyy HH:mm:ss.SSS"

When handling data from different sources where one source is cutting
away trailing zeros and the other isn't you essentially need to parse
the date to be parsed to use the correct template being used for parsing.

SimpleDateFormat parses the date correctly independent from the
number of "S" in the template and the actual number of digits
in the text to be parsed.

While my lengthy explanation of the problems with LLL might result into
the answer "not a bug, go away" ;-) I definitely see the milliseconds with
java.time.* as one.


Thanks for reading this far and best regards,

Lothar Kimmeringer

Re: Date parsing issues with SimpleDateFormat and DateTimeFormatter

Reply via email to