velvia commented on issue #686:
URL: 
https://github.com/apache/arrow-datafusion/issues/686#issuecomment-875109908


   It sounds like everyone agrees second, optional argument to 
`to_timestamp*()` to force a specific timezone result works.
   
   As for casting/coercion, we have:
   
   @alamb 
   > I think a better solution would be for DataFusion to implement coercion 
rules between date/time/timestamp types. In your example of a timestamp_col 
with Timestamp(Milliseconds, Some("UTC")), then I would like DataFusion to 
handle
   
       WHERE timestamp_col > to_timestamp("2021-06-21T12:00Z")
   by adding the following cast (please forgive me the pseudo-SQL)
   
       WHERE timestamp_col > cast (to_timestamp("2021-06-21T12:00Z") as 
timestamp(UTC))
   
   The above is saying that `Timestamp(_, None)` can be coerced into 
`Timestamp(_, Some("UTC"))`, is that right?  (AFAICT, we can't just say that 
only functions called in WHERE clauses that output None be coerced?)   However, 
a None implies local timestamp, which is not the same as an instant with UTC 
(although semantically for UrbanLogiq that definitely works. :)
   
   OTOH @westonpace wrote:
   
   > I'm not sure if you are saying the are not comparable or they should not 
be comparable. They "should" be comparable if you are comparing two columns 
with a non-None timestamp (e.g. Timestamp(units, "UTC") should be comparable 
with Timestamp(units, "+0700") but neither should compare to Timestamp(units, 
None). This is going by the newly discussed "conventional" logic where None is 
"local" and UTC is "instant. Since DF has it backwards right now you'd need to 
invert the logic.
   
   It seems to me that coercing Timestamp(_, None) to `Timestamp(_, "UTC")` 
would not fit the recent convention, at least automatic coercion without 
context.
   
   So follow up / action items / possibilities:
   
   1.  Should the current output of `to_timestamp(...)` default to 
`Timestamp(_, Some(UTC))` ?   This sounds like a yes (and would solve my use 
case)
   2.  `to_timestamp(....)` can be extended to add a second argument to accept 
different time zones, or None (not sure how to specify None if the default is 
UTC, per item 1)
   3.  Coercion ... the details of this needs to be figured out, semantics
   4.  Or, simplify everything and have the timestamp types not have timezone, 
but just if they are UTC or local (similar to Parquet).
   
   BTW I looked up the C/C++ API, and the Python API, and none of those appear 
to support timezones in the timestamp type, just a unit.   It could be argued 
that 4) would bring the Rust API closer to the other languages APIs, and when 
arrow datatypes are translated from Flight or the binary representation, how is 
the timezone encoded, if at all?   
   
   (if 4 is a worthy avenue to discuss, I can start a email chain)
   
   cheers and thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to