waitingkuo commented on issue #2979:
URL: 
https://github.com/apache/arrow-datafusion/issues/2979#issuecomment-1208194927

   @alamb after investigating more on postgresql's, i found that we have 
following differences:
   
   ### Differences
   
   1. postgresql is using microsecond for the resolution of timestamp, 
`datafusion` is using nanoseconds
   
   This is postgresql's
   
![image](https://user-images.githubusercontent.com/1100923/183434508-0dbeeea3-fba3-4851-9ea0-bb90ab2097e9.png)
   
   2. postgresql `to_timestamp` have 2 kinds of signatures:
   
   - 2.1 `to_timestamp(double precision) -> timestamp with time zone`
   
   <https://www.postgresql.org/docs/14/functions-datetime.html>
   
   
![image](https://user-images.githubusercontent.com/1100923/183435339-ae81a5f8-29b7-48f9-ae77-0d8c61a324de.png)
   
   the input for `to_timestamp` in posgresql has the unit second. i.e. 
`to_timestamp(1)` means 1 second after 1970 but not 1 microconds.
   Current `datafusion` supports `Int64, TimeUnit(Second), 
TimeUnit(MicroSecond), TimeUnit(MilliSecond), TimeUnit(NanoSeconds), and 
UTF-8`. Our current `to_timestamp(int64)` deems the input as nanoseconds 
instead of seconds which is different from postgresql's. Postresql doesnt 
support `to_timestamp(TimeUnit)`. I suggest to align `to_timestamp(i64)` to 
posgresql's; and add another to_timestamp(float64). And leave the 
`to_timestamp(TimeUnit)` behave the same as what we have now (conversion 
between units). `to_timestamp(UTF-8)` is in 2.2
   
   - 2.2 `to_timestamp`
   <https://www.postgresql.org/docs/14/functions-formatting.html>
   
   
![image](https://user-images.githubusercontent.com/1100923/183435566-e931e12f-6b63-43e3-95f6-67f4ce44aef6.png)
   
   `datafusion` currently supports `to_timestamp(utf-8)` without the 2nd 
arguments. we use [chrono](https://docs.rs/chrono/latest/chrono/)
   ) to parse the time string without specify the format (while posgresql's 
need us to specify the format)
   
   
   3. postgresql's `to_timestamp` output timestamp with timezone, while 
`datafusion` outputs `TimeUnit(UNIT, None)`. The second argument for `TimeUnit` 
in `arrow` in implements timezone, just we don't use it in `to_timestamp` yet
   
   
   
   ### Summary & My proposal
   
   1. keep using nanosecond as default while there's no specific timeunit set
   2. make the input unit for `to_timestamp(I64)`, `to_timestamp_micros(I64)`, 
`to_timestamp_millis(I64)`, `to_timestamp_seconds(I64)` as seconds. (it's 
currently nano for to_timestamp, milli for time_timestamp_millis, ...)
   3. support `to_timestamp(float64)` and `to_timestamp_xx(float64) 
   4. keep using chorno's time string format for now
   5. start another thread for dicussing timezone implementation, I feel like 
this might break many things. And we also need to implement something like `SET 
TIME ZONE AS xxx` like what posgresql has
   
   Please leave your comments. I'll submit another issue for tracking all of 
these.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to