Hi Lothar,
The issue you are seeing here is the one we are aware of for a long
time: https://bugs.openjdk.org/browse/JDK-8194289
The difference between JDK8 and JDK11 comes from the difference between
JDK's legacy locale data and the CLDR locale data provided by the
Unicode Consortium. The CLDR locale data became the default locale data
in JDK9 (https://openjdk.org/jeps/252)
The locale data changes from time to time, from small ones such as
translation changes to the somewhat significant ones, so the best course
of action is not relying on any specific locale data. However, I see
your situation is to parse dates from outside the JDK, so I understand
you will need to adapt to whatever the date format the app receives.
As the immediate temporary solution, you could choose the legacy locale
data over CLDR, by using the system property
`java.locale.providers=COMPAT` mentioned in those JBS issues. For a
longer term, you might want to implement
`java.text.spi.DateFormatProvider` for your specific needs (in this
case, parse abbreviated months without the trailing dot in German).
Please note that the system property value `COMPAT` was a temporary
measure for migration, so it is deprecated as of JDK21 and will
eventually be removed in the future JDK.
Also, we are in the process of revising the old release notes wrt the
CLDR compatibility issues so that users will know them beforehand.
HTH,
Naoto
On 10/10/23 3:47 AM, Lothar Kimmeringer wrote:
Hi,
my first mail in this list, so please be gentle ;-)
I've encountered issues when trying to keep date parsing functionality when
migrating from Java 8 to Java 11. This happened a while ago and I
implemented
local workarounds but with installations using more recent versions of
Java 11
things broke again so I'm not sure if I'm simply doing things wrong or
if there
are actual bugs.
I've attached my JUnit-class that contains the different issues (not as
single
tests but I will highlight them here in this mail. "Here"
SimpleDateFormat is
used but I've added a test to use DateTimeFormatter to make sure that it's
not the use of old classes and that the problem persists in the new API as
well.
Most issues come up when trying to parse abbreviated months with Locales
different from "en". Our use case is that data with the same date layout
but different Locales are parsed (e.g. Ebay revenue summary CSV-files or
FTP
servers on german Windows installations). The dates used there are of
the form
- Ebay: 18. Mär 2023
- FTP server: Mär 14 2022
This worked well with MMM in the template till Java 8 then LLL got
introduced
and MMM now leads to the use of four letters being used for the
abbreviation
including a dot. Btw: I think the Javadoc that explains the template-parts
(e.g. in SimpleDateFormat) should have an additional column containing
an example
for a non-EN-Locale, because
M Month in year (context sensitive) Month July; Jul; 07
L Month in year (standalone form) Month July; Jul; 07
isn't helping at all to see the effect of these two template-parts, so e.g.
M Month ... (context...) Month January;... Januar;
Jan.; 01
M Month ... (standalone...) Month January;... Januar;
Jan; 01
might be better for understanding it.
With the use of LLL all tests with dates without a dot can now be parsed
again using the same mask. But it's not possible to parse a date where
the month is always abbreviated with a dot in a consistent way, e.g.
23. Dez. 2016 11:12:13.456
using the template
dd. LLL. yyyy HH:mm:ss.SSS
It works with Locale en (with "Dec" as month of course) but not with "de".
Reason is that SimpleDateFormat is using all month display names when
parsing "month standalone". That also includes the abbreviated month
including
dots. Because these months are in general longer than their standalone
counterparts (except three-letter months like "Mai" in german)
matchString considers this as best match, "consuming" the dot in the
text to
be parsed which is now missing when the parsing continues.
DateTimeFormatter seem to work differently because it's not failing at that
point (haven't debugged it) but is failing when trying to parse russian
dates
without abbreviating dots. I assume that is because the ru-Locale doesn't
seem to have values for the standalone month. I could live with that given
our user base but the parser in java.time runs into problems when parsing
a time with milliseconds: You need to provide as many "S" as there are
digits
in the value:
- "23. Dec. 2016 11:12:13.456" needs "dd. LLL. yyyy HH:mm:ss.SSS",
it doesn't work with "dd. LLL. yyyy HH:mm:ss.S"
- "23. Dec. 2016 11:12:13.4" needs "dd. LLL. yyyy HH:mm:ss.S",
it doesn't work with "dd. LLL. yyyy HH:mm:ss.SSS"
When handling data from different sources where one source is cutting
away trailing zeros and the other isn't you essentially need to parse
the date to be parsed to use the correct template being used for parsing.
SimpleDateFormat parses the date correctly independent from the
number of "S" in the template and the actual number of digits
in the text to be parsed.
While my lengthy explanation of the problems with LLL might result into
the answer "not a bug, go away" ;-) I definitely see the milliseconds with
java.time.* as one.
Thanks for reading this far and best regards,
Lothar Kimmeringer