waitingkuo commented on issue #2979: URL: https://github.com/apache/arrow-datafusion/issues/2979#issuecomment-1208194927
@alamb after investigating more on postgresql's, i found that we have following differences: ### Differences 1. postgresql is using microsecond for the resolution of timestamp, `datafusion` is using nanoseconds This is postgresql's  2. postgresql `to_timestamp` have 2 kinds of signatures: - 2.1 `to_timestamp(double precision) -> timestamp with time zone` <https://www.postgresql.org/docs/14/functions-datetime.html>  the input for `to_timestamp` in posgresql has the unit second. i.e. `to_timestamp(1)` means 1 second after 1970 but not 1 microconds. Current `datafusion` supports `Int64, TimeUnit(Second), TimeUnit(MicroSecond), TimeUnit(MilliSecond), TimeUnit(NanoSeconds), and UTF-8`. Our current `to_timestamp(int64)` deems the input as nanoseconds instead of seconds which is different from postgresql's. Postresql doesnt support `to_timestamp(TimeUnit)`. I suggest to align `to_timestamp(i64)` to posgresql's; and add another to_timestamp(float64). And leave the `to_timestamp(TimeUnit)` behave the same as what we have now (conversion between units). `to_timestamp(UTF-8)` is in 2.2 - 2.2 `to_timestamp` <https://www.postgresql.org/docs/14/functions-formatting.html>  `datafusion` currently supports `to_timestamp(utf-8)` without the 2nd arguments. we use [chrono](https://docs.rs/chrono/latest/chrono/) ) to parse the time string without specify the format (while posgresql's need us to specify the format) 3. postgresql's `to_timestamp` output timestamp with timezone, while `datafusion` outputs `TimeUnit(UNIT, None)`. The second argument for `TimeUnit` in `arrow` in implements timezone, just we don't use it in `to_timestamp` yet ### Summary & My proposal 1. keep using nanosecond as default while there's no specific timeunit set 2. make the input unit for `to_timestamp(I64)`, `to_timestamp_micros(I64)`, `to_timestamp_millis(I64)`, `to_timestamp_seconds(I64)` as seconds. (it's currently nano for to_timestamp, milli for time_timestamp_millis, ...) 3. support `to_timestamp(float64)` and `to_timestamp_xx(float64) 4. keep using chorno's time string format for now 5. start another thread for dicussing timezone implementation, I feel like this might break many things. And we also need to implement something like `SET TIME ZONE AS xxx` like what posgresql has Please leave your comments. I'll submit another issue for tracking all of these. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
