mtsadler-branch commented on issue #38681:
URL: https://github.com/apache/airflow/issues/38681#issuecomment-2034816070

   To reproduce the error, I have the following PyPi packages installed:
   ```
   apache-airflow==2.8.4
   apache-airflow-providers-snowflake==5.3.1
   pandas==1.3.5
   pyarrow==10.0.1
   snowflake-connector-python==3.5.0
   ```
   
   And ran the following code:
   ```Python3
   from airflow.providers.snowflake.hooks.snowflake import SnowflakeHook
   import pandas as pd
   from snowflake.connector.pandas_tools import pd_writer
   
   
   hook = SnowflakeHook(
       # Add SF connection details here...
   )
   
   # Create table
   temp_sf = {
       "database": "test",
       "schema": "scratch",
       "table": "temp_table",
   }
   table_name = f"{temp_sf['database']}.{temp_sf['schema']}.{temp_sf['table']}"
   create_table_query = f"""
       CREATE OR REPLACE TABLE {table_name}
       (COL1 INT, COL2 TIMESTAMP_NTZ(9)) AS
       SELECT 1, '2021-01-01T01:00:00.000000000'::timestamp_ntz
   """
   results = hook.get_pandas_df(create_table_query)
   
   
   # Append new data
   new_data = pd.DataFrame({
       "COL1": [4, 5, 6],
       "COL2": [
           '2021-01-04T04:00:00.000000000',
           '2021-01-05T05:00:00.000000000',
           '2021-01-06T06:00:00.000000000',
       ],
   })
   engine = hook.get_sqlalchemy_engine()
   with engine.connect() as conn:
       # Regardless of whether pyarrow is installed, this will append data to 
Snowflake table
       # However, if pyarrow isn't installed, then COL2 will have invalid 
timestamps
       new_data.to_sql(
           name=table_name,
           con=conn,
           if_exists="append",
           index=False,
           method=pd_writer,
       )
   # If pyarrow is installed, this will return the correct data
   # If pyarrow isn't installed, this will error: Timestamp 
'(seconds_since_epoch=1712074200000000000)' is not recognized
   results = hook.get_pandas_df(f"SELECT * FROM {table_name}")
   print(results)
   ```
   
   **Results:**
   You'll see 
`DataFrame.to_sql(con=SnowflakeHook.get_sqlalchemy_engine().connect())` works 
as expected with `pyarrow==10.0.1`.
   
   But when `pyarrow` isn't installed, the data written to `COL2` isn't the 
correct format for `Timestamp()`, which causes an error when trying to query 
`COL2`. 
   
   **Open Questions:**
   Is there a different pattern for inserting/appending data to an existing 
table, which doesn't require `pyarrow`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to