judahrand commented on issue #1310:
URL: https://github.com/apache/arrow-adbc/issues/1310#issuecomment-1825436120
As workaround for now I think you should be able to do:
```python
def test_csv_dataset_batch(base_path):
"""Test adbc_ingest() with csv files using pyarrow.dataset.dataset(),
read into a pyarrow RecordBatch."""
dst = ds.dataset(
base_path,
format=ds.CsvFileFormat(read_options=csv.ReadOptions(use_threads=False,
block_size=CHUNK_SIZE)),
partitioning=ds.FilenamePartitioning(
pa.schema([("year", pa.int64())]),
),
)
record_batches = dst.to_batches()
table_name = "test_csv_dataset_batch"
# Recreate table based on the first batch
ingest_data(name=table_name, data=next(recordr_batches), mode="replace")
# Append all subsequent batches to the table
for record_batch in record_batches:
ingest_data(name=table_name, data=record_batch, mode="append")
```
Though, I'd agree that it would be preferable (and probably slightly more
performant) if you could just pass the Dataset (or RecordBatchReader, or
Scanner) to the driver.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]