velvia commented on issue #686:
URL:
https://github.com/apache/arrow-datafusion/issues/686#issuecomment-876625232
@adamhopper personally I'd be OK with your recommendations, but some
thoughts:
- There is some interop considerations between `TIMESTAMP WITH TIMEZONE` ie
TimestampType(_, UTC) vs TimestampType(_, None). The reason is that so far
DataFusion has been generating `TimestampType(_, None)`. If this changes to
UTC, which makes sense as far as conventions go, then we need some way for
people with data with Timestamp(_, None) to be able to convert it over to (_,
UTC).
- I'm not sure it would be acceptable to throw an error at `isAdjustedToUTC
= false` as I'm sure people have Parquet files out there with this, and then
DataFusion/Arrow wouldn't be able to read it. But maybe it's OK if you give
people a way to cast / convert the timestamps over.
Also, I see your point about `to_timestamp()` having a timezone argument
being confusing. However there is a second use of `to_timestamp()` right
now, which is for conversion purposes, from other types to
Timestamp(resolution, tz). The reason it exists like this is because there is
no standard SQL/PG way to work with multiple timestamp resolutions, which are
required and supported by all Arrow implementations.
Thus, while `TO_TIMESTAMP('2021-01-01T00:00:00', 'America/Montreal')` might
be confusing, I believe the following would be more clear:
* `to_timestamp_millis(my_timestamp_col, 'UTC')` -> cast my_timestamp_col
from int or another timestamp resolution to millisecond resolution, and make
sure it has UTC timezone
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]