velvia commented on issue #686: URL: https://github.com/apache/arrow-datafusion/issues/686#issuecomment-875109908
It sounds like everyone agrees second, optional argument to `to_timestamp*()` to force a specific timezone result works. As for casting/coercion, we have: @alamb > I think a better solution would be for DataFusion to implement coercion rules between date/time/timestamp types. In your example of a timestamp_col with Timestamp(Milliseconds, Some("UTC")), then I would like DataFusion to handle WHERE timestamp_col > to_timestamp("2021-06-21T12:00Z") by adding the following cast (please forgive me the pseudo-SQL) WHERE timestamp_col > cast (to_timestamp("2021-06-21T12:00Z") as timestamp(UTC)) The above is saying that `Timestamp(_, None)` can be coerced into `Timestamp(_, Some("UTC"))`, is that right? (AFAICT, we can't just say that only functions called in WHERE clauses that output None be coerced?) However, a None implies local timestamp, which is not the same as an instant with UTC (although semantically for UrbanLogiq that definitely works. :) OTOH @westonpace wrote: > I'm not sure if you are saying the are not comparable or they should not be comparable. They "should" be comparable if you are comparing two columns with a non-None timestamp (e.g. Timestamp(units, "UTC") should be comparable with Timestamp(units, "+0700") but neither should compare to Timestamp(units, None). This is going by the newly discussed "conventional" logic where None is "local" and UTC is "instant. Since DF has it backwards right now you'd need to invert the logic. It seems to me that coercing Timestamp(_, None) to `Timestamp(_, "UTC")` would not fit the recent convention, at least automatic coercion without context. So follow up / action items / possibilities: 1. Should the current output of `to_timestamp(...)` default to `Timestamp(_, Some(UTC))` ? This sounds like a yes (and would solve my use case) 2. `to_timestamp(....)` can be extended to add a second argument to accept different time zones, or None (not sure how to specify None if the default is UTC, per item 1) 3. Coercion ... the details of this needs to be figured out, semantics 4. Or, simplify everything and have the timestamp types not have timezone, but just if they are UTC or local (similar to Parquet). BTW I looked up the C/C++ API, and the Python API, and none of those appear to support timezones in the timestamp type, just a unit. It could be argued that 4) would bring the Rust API closer to the other languages APIs, and when arrow datatypes are translated from Flight or the binary representation, how is the timezone encoded, if at all? (if 4 is a worthy avenue to discuss, I can start a email chain) cheers and thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org