lidavidm commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1416443098

   I don't think it's either-or, I think we're tangling two concerns up here. 
   
   > My proposal here is to expose Dataset/Scanner python abc's with s.t. rust 
libraries can extend via pyo3+python so higher level tooling (like duckdb for 
example, can query these without having to transfer the whole Table into memory 
first).
   
   Dataset is sort of an API standard for this, or at least you can press it 
into service. But Dataset is also useful in its own right and a meaningful 
abstraction, so people want to extend it.
   
   If we define a new API, of course Dataset should implement it!
   
   > How do I extend Dataset from a separate package
   
   Dataset is already extensible. The real problem is the Python integration, 
and wheels/packaging questions on top of that. The toplevel proposal sounds 
like sidestepping that entirely by introducing a separate abstraction layer at 
the Python level (hence, exposing ABCs in Python).
   
   > IMO the current DuckDB integration feels a little silly.
   
   I don't think they had a choice, because there's not really a formal API for 
what they really want :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to