lidavidm commented on issue #33986: URL: https://github.com/apache/arrow/issues/33986#issuecomment-1416443098
I don't think it's either-or, I think we're tangling two concerns up here. > My proposal here is to expose Dataset/Scanner python abc's with s.t. rust libraries can extend via pyo3+python so higher level tooling (like duckdb for example, can query these without having to transfer the whole Table into memory first). Dataset is sort of an API standard for this, or at least you can press it into service. But Dataset is also useful in its own right and a meaningful abstraction, so people want to extend it. If we define a new API, of course Dataset should implement it! > How do I extend Dataset from a separate package Dataset is already extensible. The real problem is the Python integration, and wheels/packaging questions on top of that. The toplevel proposal sounds like sidestepping that entirely by introducing a separate abstraction layer at the Python level (hence, exposing ABCs in Python). > IMO the current DuckDB integration feels a little silly. I don't think they had a choice, because there's not really a formal API for what they really want :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
