wjones127 commented on issue #33986: URL: https://github.com/apache/arrow/issues/33986#issuecomment-1542459532
Yes I agree what we want is (1): "An interface for consuming data from a dataset-like object, without having to be a pyarrow.dataset.Dataset (or Scanner) instance." I'm basically thinking we have table formats with Python libraries: Delta Lake, Iceberg, and Lance. And we have single-node query engines such as DuckDB, Polars, and Datafusion. It would be cool if we could pass any of the table formats into any of the query engines, all with one protocol. We have a prototype version of this that works well in some ways, but in order to be fully viable needs to be turned into a proper well-defined protocol. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
