enrico-stauss commented on issue #3766:
URL: https://github.com/apache/arrow-adbc/issues/3766#issuecomment-3613069302

   Thanks @zeroshade , I overlooked that part of the discussion. But I would 
have assumed that the driver reads the parquet file, then inspects the metadata 
and makes an informed decision on what the column type should be, and sends 
that information to Snowflake together with the data?
   
   I think that this behaviour is wrong, let me try to lay out why.
   
   
   ## Parquet `isAdjustedToUTC=true` should map to `TIMESTAMP_TZ`, not 
`TIMESTAMP_LTZ`
   
   **Current behavior:** ADBC maps Parquet timestamps with 
`isAdjustedToUTC=true` and timezone=UTC to Snowflake's `TIMESTAMP_LTZ`.
   
   **Problem:** `TIMESTAMP_LTZ` displays times in the session's local timezone. 
When you write `2024-01-01 12:00:00 UTC` to Parquet and load it via ADBC:
   - Session in UTC queries it: `2024-01-01 12:00:00` (correct - this is noon 
UTC)
   - Session in Tokyo queries it: `2024-01-01 12:00:00` (wrong - this displays 
as noon JST, which is actually 03:00 UTC)
   - Session in New York queries it: `2024-01-01 12:00:00` (wrong - this 
displays as noon EST, which is actually 17:00 UTC)
   
   **Everyone sees the same wall-clock time, but they're looking at different 
moments in time.** This is dangerous because users in different locations think 
they're seeing the same data, but they're actually seeing values shifted by 
their timezone offset.
   
   **Proposed fix:** Map to `TIMESTAMP_TZ` with explicit UTC offset (`+00:00`), 
which would display as `2024-01-01 12:00:00+00:00` for all users regardless of 
session timezone.
   
   **Why this works:**
   1. Per the [[Parquet 
spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp)](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp),
 `isAdjustedToUTC=true` means timestamps are already normalized to UTC - the 
Parquet file stores only an INT64 with no offset information
   2. Times display consistently as UTC across all sessions, matching the 
source data semantics
   3. Maintains interoperability with Pandas/Polars, which correctly preserve 
the UTC timezone
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to