Re: [I] adbc_ingest() is dropping rows in Snowflake [arrow-adbc]

via GitHub Thu, 16 May 2024 08:02:55 -0700


joellubi commented on issue #1847:
URL: https://github.com/apache/arrow-adbc/issues/1847#issuecomment-2115496211


   Update on the investigation. I can force a total failure (0 rows ingested) 
every time by setting `OptionStatementIngestWriterConcurrency` to `1`. If I set 
it to `2` then 2 files get uploaded, one of which contains about half the rows 
and the other contains 0.
   
   When I download the parquet files from the stage in snowflake and read them 
locally, all rows are present in all files. In the test case I'm running, there 
is exactly one empty record batch. It seems that whichever parquet file that 
one get written to gets "tainted". Not sure exactly in what way yet, but some 
tools read the file just fine (e.g. duckdb) while in snowflake's case it parses 
0 rows and doesn't report an error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] adbc_ingest() is dropping rows in Snowflake [arrow-adbc]

Reply via email to