Re: [I] [Python]: Support PyCapsule Interface Objects as input in more places [arrow]

via GitHub Tue, 20 Aug 2024 08:35:25 -0700


jorisvandenbossche commented on issue #43410:
URL: https://github.com/apache/arrow/issues/43410#issuecomment-2299154176


   Specifically for `pq.write_table()`, this might be a bit trickier (without 
consuming the stream) because this currently uses 
`parquet::arrow::FileWriter::WriteTable` which is explicitly requiring a table 
input. The FileWriter interface has support for writing record batches as well, 
so we could rewrite the code a bit to iterate over the batches of the stream 
(but at that point, should that be done in something called `write_table`?) 
   
   But in general, certainly +1 on more widely supporting the interface.
   
   Some other possible areas:
   
   - The dataset API for writing. In this case, `pyarrow.dataset.write_dataset` 
already does accept a record batch reader, so this should be straightforward to 
extend
   - Compute functions from `pyarrow.compute` ? Those could certainly accept 
objects with `__arrow_c_array__`, and in theory also `__arrow_c_stream__`, but 
those will fully consume the stream and return a materialized result, so not 
sure if that will be expected? (although, if you know those functions, that is 
kind of expected, so maybe this just requires good documentation)
   - Many of the methods on the Array/RecordBatch/Table classes accept similar 
objects (e.g. `arr.take(..)`). Not sure if we want to make those work with 
interface objects as well. Although currently what we exactly support as input 
is a bit inconsistent (only strictly a pyarrow array, or also a numpy array, a 
list, anything array-like or any sequence or collection? So if we would 
harmonize that with some helper, then we could at once also easily add support 
for any arrow-array-like object)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python]: Support PyCapsule Interface Objects as input in more places [arrow]

Reply via email to