davlee1972 commented on issue #2094: URL: https://github.com/apache/arrow-adbc/issues/2094#issuecomment-2305511691
I'm pretty sure that somehow temporary stage files aren't being processed between adbc_ingest() calls. CREATE OR REPLACE TEMPORARY STAGE ADBC$BIND FILE_FORMAT = (TYPE = PARQUET USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE) PUT 'file:///tmp/placeholder/43.parquet' @ADBC$BIND OVERWRITE = TRUE << Has zero rows for the first adbc_ingest() call PUT 'file:///tmp/placeholder/43.parquet' @ADBC$BIND OVERWRITE = TRUE << Has 146 rows for the second adbc_ingest() call but this is skipped when copy into is called maybe since 43.parquet was previously processed?? With a single adbc_ingest() call reading 22 small parquet files I see all 3196 rows inserted.  With multiple adbc_ingest() calls for each individual 22 parquet files with a full ONE minute delay and fresh connection/cursor between calling adbc_ingest().. I'm short rows 2765 / 3196 inserted.. 3 days from 3 files have been skipped over..   Removing my custom code and calling adbc_ingest() directly with the same cursor ends up short rows as well. Calling adbc_ingest() for the 2024-09-05 end up inserting zero rows..  Swapping in a pyarrow.table for the pyarrow.recordbatchreader for the input for adbc_ingest() didn't help either. `cursor.adbc_ingest(snowflake_gateway.source.table_name, dataset.to_table(), mode="append")` Two days were skipped..  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
