zeroshade commented on issue #2843:
URL: https://github.com/apache/arrow-adbc/issues/2843#issuecomment-2907209749

   Ah, so the most likely issue here is the way we do ingestion for Snowflake. 
Snowflake doesn't provide a native Arrow ingestion method, so adbc_ingest 
writes out Parquet files and then uses `COPY INTO` on the Snowflake side to 
copy the data into the snowflake table from the uploaded Parquet files. In the 
best case scenario, we'd be attempting to ingest parquet files with a column 
with the `timeAdjustedToUTC` flag set to true in the logical type.
   
   Now, according to the Snowflake docs: 
   
   > TIMESTAMP_TZ internally stores UTC time together with an associated time 
zone offset. When a time zone isn’t provided, the session time zone offset is 
used.
   
   So, assuming the uploaded parquet file has the proper logical type, it would 
come down to how Snowflake handles the copy-into of the parquet file into the 
column since Parquet doesn't have a per-row time zone either. Essentially one 
of the following is happening:
   
   1. Polars generates the appropriate Arrow type with `Timestamp[ns, UTC]`, we 
generate parquet files with the type 
`Timestamp[isAdjustedToUTC=true,unit=NANOS]` for ingestion, and Snowflake 
doesn't respect the logical type. As a result, it assumes it doesn't have an 
associated time zone offset and assigns the session time zone (the account time 
zone is the default for the session) for the values.
   2. Polars generates Arrow data with an empty timezone instead of "UTC", 
resulting in the parquet files being written with `isAdjustedToUTC=false`. 
Snowflake then acts accordingly by assuming the values are in the 
session/account time zone.
   
   In scenario 1: I would argue the issue is on Snowflake's side and they would 
have to address it.
   
   In scenario 2: Polars should be fixed so that it generates the type using 
the "UTC" timezone explicitly, and then it would still remain to be seen if 
snowflake would respect that or not.
   
   In either case, another option could be to simply set the session timezone 
to UTC before running the ingest which would have it at least apply the correct 
time zone in this case. But isn't a good general solution.
   
   @lidavidm @CurtHagenlocher I'm curious what your thoughts are on the above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to