[GitHub] [arrow] jorisvandenbossche commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

via GitHub Wed, 10 May 2023 08:17:29 -0700


jorisvandenbossche commented on issue #35531:
URL: https://github.com/apache/arrow/issues/35531#issuecomment-1542393207


   > Also, this proposal doesn't dwell on the consumer side. Would there be 
higher-level APIs to construct `Array` and `RecordBatch` from those capsules?
   
   Yes, indeed I currently didn't touch on that aspect. I think that could 
certainly be useful, but thought to start with the producer side of things. And 
some consumers might already have an entry point that could be reused for this 
(for example, duckdb already implicitly reads from any object that is a pandas 
DataFrale, pyarrow Table, RecordBatch, Dataset/Scanner, RecordBatchReader, 
polars DataFrame, ...., and they could just extend this to any object 
implementing this protocol). 
   Making the parallel with DLPack again, they recommend that libraries 
implement a `from_dlpack` function as the consumer interface. So we could here 
also have such a recommendation (for example `from_arrow`, although that might 
need to differentiate between stream/array/schema), but that's maybe less 
essential initially? (that's more about user facing API)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #35531: [Python] Add Python protocol for the Arrow C (Data/Stream) Interface

Reply via email to