Re: [I] adbc_ingest for snowflake dropping rows when called repeatedly [arrow-adbc]

via GitHub Thu, 22 Aug 2024 12:48:47 -0700


davlee1972 commented on issue #2094:
URL: https://github.com/apache/arrow-adbc/issues/2094#issuecomment-2305511691


   I'm pretty sure that somehow temporary stage files aren't being processed 
between adbc_ingest() calls.
   
   CREATE OR REPLACE TEMPORARY STAGE ADBC$BIND FILE_FORMAT = (TYPE = PARQUET 
USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE)
   
   PUT 'file:///tmp/placeholder/43.parquet' @ADBC$BIND OVERWRITE = TRUE  << Has 
zero rows for the first adbc_ingest() call
   
   PUT 'file:///tmp/placeholder/43.parquet' @ADBC$BIND OVERWRITE = TRUE  << Has 
146 rows for the second adbc_ingest() call
   but this is skipped when copy into is called maybe since 43.parquet was 
previously processed??
   
   With a single adbc_ingest() call reading 22 small parquet files I see all 
3196 rows inserted.
   
![image](https://github.com/user-attachments/assets/68a39500-10f2-4bc6-9832-fcbf89b469f8)
   
   With multiple adbc_ingest() calls for each individual 22 parquet files with 
a full ONE minute delay and fresh connection/cursor between calling 
adbc_ingest().. I'm short rows  2765 / 3196 inserted.. 3 days from 3 files have 
been skipped over..
   
   
![image](https://github.com/user-attachments/assets/af213739-20f1-4010-bf12-de85114fdfd6)
   
   
![image](https://github.com/user-attachments/assets/f8b6c15e-87af-4304-a670-4f18d2574595)
   
   Removing my custom code and calling adbc_ingest() directly with the same 
cursor ends up short rows as well.
   Calling adbc_ingest() for the 2024-09-05 end up inserting zero rows..
   
![image](https://github.com/user-attachments/assets/de233c9a-bb8f-46d1-befd-4b5ea8c77027)
   
   Swapping in a pyarrow.table for the pyarrow.recordbatchreader for the input 
for adbc_ingest() didn't help either.
   `cursor.adbc_ingest(snowflake_gateway.source.table_name, dataset.to_table(), 
mode="append")`
   Two days were skipped..
   
![image](https://github.com/user-attachments/assets/aee12ddc-c229-45b8-8301-6ea3d468f939)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] adbc_ingest for snowflake dropping rows when called repeatedly [arrow-adbc]

Reply via email to