adamhooper commented on issue #686: URL: https://github.com/apache/arrow-datafusion/issues/686#issuecomment-876681175
@velvia Great points My last two suggestions were exactly about interop -- and a transition period. Today, DataFusion users interpret timezone=null to mean `TIMESTAMP WITH TIMEZONE`; but eventually, DataFusion must consider timezone=null to mean `TIMESTAMP WITHOUT TIMEZONE`, right? Users will need to change what they're doing; the transition path will be hard and it depends on the DataFusion community. I hadn't thought of timestamp resolution. Again, I'm out of my element (for now) :). The crux of my suggestion is to ignore the `timezone` metadata field (treat it as a boolean, `timezone=null` or `timezone=UTC`). That translates Parquet <=> Arrow <=> PostgreSQL cleanly. Treating it like a boolean keeps the feature list small and saves people from confusion. As for the `TO_TIMESTAMP()` parameter: Postgres has a different function might do exactly what you're suggesting. [`AT TIME ZONE`](https://www.postgresql.org/docs/13/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT) _toggles_ the "`WITH TIMEZONE`-ness" of a timestamp. It's also callable as a function, `TIMEZONE(zone, timestamp)`. But -- sidetracking here -- how important are these types? `TIMESTAMP WITH TIME ZONE` is clearly mission-critical. I think `TIMESTAMP WITHOUT TIME ZONE` has its place; but it's in a different ballpark, right. (Even Spark doesn't have `TIMESTAMP WITHOUT TIME ZONE`; and, well, does DataFusion?) Arrow's timezone metadata column is even more obscure: I've only heard of it in Pandas and R. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
