enrico-stauss commented on issue #3766: URL: https://github.com/apache/arrow-adbc/issues/3766#issuecomment-3613069302
Thanks @zeroshade , I overlooked that part of the discussion. But I would have assumed that the driver reads the parquet file, then inspects the metadata and makes an informed decision on what the column type should be, and sends that information to Snowflake together with the data? I think that this behaviour is wrong, let me try to lay out why. ## Parquet `isAdjustedToUTC=true` should map to `TIMESTAMP_TZ`, not `TIMESTAMP_LTZ` **Current behavior:** ADBC maps Parquet timestamps with `isAdjustedToUTC=true` and timezone=UTC to Snowflake's `TIMESTAMP_LTZ`. **Problem:** `TIMESTAMP_LTZ` displays times in the session's local timezone. When you write `2024-01-01 12:00:00 UTC` to Parquet and load it via ADBC: - Session in UTC queries it: `2024-01-01 12:00:00` (correct - this is noon UTC) - Session in Tokyo queries it: `2024-01-01 12:00:00` (wrong - this displays as noon JST, which is actually 03:00 UTC) - Session in New York queries it: `2024-01-01 12:00:00` (wrong - this displays as noon EST, which is actually 17:00 UTC) **Everyone sees the same wall-clock time, but they're looking at different moments in time.** This is dangerous because users in different locations think they're seeing the same data, but they're actually seeing values shifted by their timezone offset. **Proposed fix:** Map to `TIMESTAMP_TZ` with explicit UTC offset (`+00:00`), which would display as `2024-01-01 12:00:00+00:00` for all users regardless of session timezone. **Why this works:** 1. Per the [[Parquet spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp)](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp), `isAdjustedToUTC=true` means timestamps are already normalized to UTC - the Parquet file stores only an INT64 with no offset information 2. Times display consistently as UTC across all sessions, matching the source data semantics 3. Maintains interoperability with Pandas/Polars, which correctly preserve the UTC timezone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
