paulirwin commented on issue #846:
URL: https://github.com/apache/lucenenet/issues/846#issuecomment-2566110261

   I am happy to report that I have definitively found the source of the 
failures from this original issue. It occurs only for the zh-Hant-TW culture, 
for any time zones with a negative offset, and only on .NET 6-8 on Linux and 
macOS.
   
   The root cause is that that culture's CultureInfo is missing the `tt` AM/PM 
designator from the format strings in `DateTimeFormat.LongTimePattern` (as well 
as `FullDateTimePattern`, but we aren't using that). So the following code:
   
   ```c#
   using System.Globalization;
   var ci = CultureInfo.GetCultureInfo("zh-Hant-TW");
   Console.WriteLine(ci.DateTimeFormat.LongDatePattern + " " + 
ci.DateTimeFormat.LongTimePattern);
   Console.WriteLine(new DateTime(1969, 12, 31, 20, 0, 
0).ToString(ci.DateTimeFormat.LongDatePattern + " " + 
ci.DateTimeFormat.LongTimePattern, ci));
   ```
   
   ... results in the following on .NET 6-8 on Linux and macOS:
   
   ```
   yyyy年M月d日 dddd h:mm:ss
   1969年12月31日 星期三 8:00:00
   ```
   
   ... whereas it results in the following on .NET 9, because it includes the 
`tt` format before the hour:
   ```
   yyyy年M月d日 dddd tth:mm:ss
   1969年12月31日 星期三 下午8:00:00
   ```
   
   Without the 下午, this gets parsed as 8:00 am on the day before the unix epoch 
(in this case, at a -4:00 offset time zone), rather than 8:00 pm, which when 
adjusted for the timezone offset is the unix epoch exactly. Because the time is 
12 hours before the epoch, the documents do not match the date queries, and the 
expected number of results is not returned, thus the test assertion fails. This 
is only a problem with negative offsets, because with zero or positive offsets, 
it is a number on or after midnight which will correctly get parsed as AM 
without the designator.
   
   I wrote a small program to go through and verify all cultures to see if any 
others were a problem like this, and it seems to only be zh-Hant-TW, and only 
net6.0-net8.0. The .NET team seems to have fixed this (possibly unintentionally 
by upgrading ICU) in .NET 9.
   
   I am going to fix this by adding another form of "sanity" check for the 
culture/time zone combinations that ensures that the unix epoch can round-trip 
ToString/Parse with the given format string. If it fails, then it'll iterate 
again and find a new random culture/time zone that works.
   
   Additionally, I found another failure through many repeated random runs, 
that had not been reported yet. For cultures that use a decimal comma, such as 
sv-FI, small decimal values can fail due to a J2N round-trip formatting/parsing 
bug when there is a decimal comma and exponential notation. That has been filed 
as https://github.com/NightOwl888/J2N/issues/128.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to