Zan-L commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2228843267

   @zeroshade @joellubi I created a script to generate dummy data for 
reproducing the issue:
   
   ```python
   import polars as pl
   import pyarrow.dataset as ds
   import adbc_driver_snowflake.dbapi
   
   num_rows = 10_000_000
   parquet_path = 
   conn_uri = 
   
   lf = pl.LazyFrame({'id': range(num_rows)})
   lf = lf.with_columns(pl.lit('This is just a dummy test 
string.').alias(f"dummy_string_{i}") for i in range(30))
   lf.sink_parquet(parquet_path)
   data = ds.dataset(parquet_path)
   
   # Splits into 83 files in 1.0 but only 4 in 1.1
   with adbc_driver_snowflake.dbapi.connect(conn_uri, autocommit=True) as conn, 
conn.cursor() as cursor:
       
cursor.adbc_statement.set_options(**{'adbc.snowflake.statement.ingest_target_file_size':
 str(2**14), 'adbc.snowflake.statement.ingest_writer_concurrency': '4'})
       cursor.adbc_ingest('Test', data, 'replace')
   ```
   
   With ADBC 1.0.0, the data were split into 83 parquet files:
   ![ADBC_1 
0](https://github.com/user-attachments/assets/212cbfc7-0cfc-4fba-8818-9963c5771889)
   
   But only 4 with ADBC 1.1.0:
   ![ADBC_1 
1](https://github.com/user-attachments/assets/401ecfe9-f272-40f5-8d17-adda873967d6)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to