Re: [I] Errors using pyarrow Dataset with adbc_ingest() for adbc_driver_postgres() [arrow-adbc]

via GitHub Fri, 24 Nov 2023 02:10:44 -0800


judahrand commented on issue #1310:
URL: https://github.com/apache/arrow-adbc/issues/1310#issuecomment-1825436120


   As workaround for now I think you should be able to do:
   
   ```python
   def test_csv_dataset_batch(base_path):
       """Test adbc_ingest() with csv files using pyarrow.dataset.dataset(), 
read into a pyarrow RecordBatch."""
       dst = ds.dataset(
           base_path,
           
format=ds.CsvFileFormat(read_options=csv.ReadOptions(use_threads=False, 
block_size=CHUNK_SIZE)),
           partitioning=ds.FilenamePartitioning(
               pa.schema([("year", pa.int64())]),
           ),
       )
       record_batches = dst.to_batches()
       table_name = "test_csv_dataset_batch"
       # Recreate table based on the first batch
       ingest_data(name=table_name, data=next(recordr_batches), mode="replace")
       # Append all subsequent batches to the table
       for record_batch in record_batches:
           ingest_data(name=table_name, data=record_batch, mode="append")
   ```
   
   Though, I'd agree that it would be preferable (and probably slightly more 
performant) if you could just pass the Dataset (or RecordBatchReader, or 
Scanner) to the driver.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Errors using pyarrow Dataset with adbc_ingest() for adbc_driver_postgres() [arrow-adbc]

Reply via email to