wjones127 commented on PR #35568: URL: https://github.com/apache/arrow/pull/35568#issuecomment-1591718615
> So is the assumption here that the producer and the consumer (in your diagram) are the same library? E.g. both are pyarrow (pyarrow has code for producing datasets and for scanning datasets)? Or is the goal to be able to produce datasets with one library and consume them with a different library? Different libraries. Producers are libraries like `lance`, `deltalake`, and `pyiceberg`. Consumers are libraries like `duckdb`, `polars`, `datafusion` and `dask`. You could say the status quo is that the consumer can be any library, but the producer is assumed to be pyarrow. This protocol helps open up the producer side to be other libraries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org