kumarUjjawal commented on issue #21515:
URL: https://github.com/apache/datafusion/issues/21515#issuecomment-4647505735

   I wanted to share my findings
   
   As per the issue:
   Spark keeps timestamps as microseconds, but hands that raw number to Java's 
formatter, which reads it as milliseconds. So every %t value comes out 1000x 
off, which is why a normal 2023 date prints as the year 55952.
   
   The problem is what that 1000x does downstream. Those inflated values land 
tens of thousands (and in some cases millions) of years in the future, which is 
well outside the range the date library we normally use can handle. So to 
reproduce Spark's output faithfully, we would end up having to write our own 
date math and our own timezone/daylight-saving handling from
   scratch. 
   
   That leaves me unsure this is worth the maintenance cost, so I wanted to 
check the direction. 
   
     1. Leave the current (correct-looking) output as-is and just document that 
we intentionally differ from Spark on this one quirk.
     2. Match the quirk only for the common UTC case, which is far less code, 
and document that unusual timezones at extreme dates may not match.
     3. Go for full fidelity, which is the large change 
   
   Worth noting this only affects %t timestamp specifiers, the rest of 
format_string already works. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to