[GitHub] [arrow] wjones127 commented on pull request #35568: GH-33986: [Python] Add a minimal protocol for datasets

via GitHub Wed, 14 Jun 2023 10:36:47 -0700


wjones127 commented on PR #35568:
URL: https://github.com/apache/arrow/pull/35568#issuecomment-1591718615


   > So is the assumption here that the producer and the consumer (in your 
diagram) are the same library? E.g. both are pyarrow (pyarrow has code for 
producing datasets and for scanning datasets)? Or is the goal to be able to 
produce datasets with one library and consume them with a different library?
   
   Different libraries. Producers are libraries like `lance`, `deltalake`, and 
`pyiceberg`. Consumers are libraries like `duckdb`, `polars`, `datafusion` and 
`dask`.
   
   You could say the status quo is that the consumer can be any library, but 
the producer is assumed to be pyarrow. This protocol helps open up the producer 
side to be other libraries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] wjones127 commented on pull request #35568: GH-33986: [Python] Add a minimal protocol for datasets

Reply via email to