velvia commented on issue #686:
URL:
https://github.com/apache/arrow-datafusion/issues/686#issuecomment-875109908
It sounds like everyone agrees second, optional argument to
`to_timestamp*()` to force a specific timezone result works.
As for casting/coercion, we have:
@alamb
> I think a better solution would be for DataFusion to implement coercion
rules between date/time/timestamp types. In your example of a timestamp_col
with Timestamp(Milliseconds, Some("UTC")), then I would like DataFusion to
handle
WHERE timestamp_col > to_timestamp("2021-06-21T12:00Z")
by adding the following cast (please forgive me the pseudo-SQL)
WHERE timestamp_col > cast (to_timestamp("2021-06-21T12:00Z") as
timestamp(UTC))
The above is saying that `Timestamp(_, None)` can be coerced into
`Timestamp(_, Some("UTC"))`, is that right? (AFAICT, we can't just say that
only functions called in WHERE clauses that output None be coerced?) However,
a None implies local timestamp, which is not the same as an instant with UTC
(although semantically for UrbanLogiq that definitely works. :)
OTOH @westonpace wrote:
> I'm not sure if you are saying the are not comparable or they should not
be comparable. They "should" be comparable if you are comparing two columns
with a non-None timestamp (e.g. Timestamp(units, "UTC") should be comparable
with Timestamp(units, "+0700") but neither should compare to Timestamp(units,
None). This is going by the newly discussed "conventional" logic where None is
"local" and UTC is "instant. Since DF has it backwards right now you'd need to
invert the logic.
It seems to me that coercing Timestamp(_, None) to `Timestamp(_, "UTC")`
would not fit the recent convention, at least automatic coercion without
context.
So follow up / action items / possibilities:
1. Should the current output of `to_timestamp(...)` default to
`Timestamp(_, Some(UTC))` ? This sounds like a yes (and would solve my use
case)
2. `to_timestamp(....)` can be extended to add a second argument to accept
different time zones, or None (not sure how to specify None if the default is
UTC, per item 1)
3. Coercion ... the details of this needs to be figured out, semantics
4. Or, simplify everything and have the timestamp types not have timezone,
but just if they are UTC or local (similar to Parquet).
BTW I looked up the C/C++ API, and the Python API, and none of those appear
to support timezones in the timestamp type, just a unit. It could be argued
that 4) would bring the Rust API closer to the other languages APIs, and when
arrow datatypes are translated from Flight or the binary representation, how is
the timezone encoded, if at all?
(if 4 is a worthy avenue to discuss, I can start a email chain)
cheers and thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]