Zan-L commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2228843267
@zeroshade @joellubi I created a script to generate dummy data for
reproducing the issue:
```python
import polars as pl
import pyarrow.dataset as ds
import adbc_driver_snowflake.dbapi
num_rows = 10_000_000
parquet_path =
conn_uri =
lf = pl.LazyFrame({'id': range(num_rows)})
lf = lf.with_columns(pl.lit('This is just a dummy test
string.').alias(f"dummy_string_{i}") for i in range(30))
lf.sink_parquet(parquet_path)
data = ds.dataset(parquet_path)
# Splits into 83 files in 1.0 but only 4 in 1.1
with adbc_driver_snowflake.dbapi.connect(conn_uri, autocommit=True) as conn,
conn.cursor() as cursor:
cursor.adbc_statement.set_options(**{'adbc.snowflake.statement.ingest_target_file_size':
str(2**14), 'adbc.snowflake.statement.ingest_writer_concurrency': '4'})
cursor.adbc_ingest('Test', data, 'replace')
```
With ADBC 1.0.0, the data were split into 83 parquet files:

But only 4 with ADBC 1.1.0:

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]