changhiskhan opened a new issue, #33986:
URL: https://github.com/apache/arrow/issues/33986

   ### Describe the enhancement requested
   
   As the Arrow ecosystem grows ever richer, desire paths emerge :)
   
   Integrating Arrow based projects written in Rust works great across the C 
data interface. But it doesn't allow lazy execution or pushdowns in the same 
way that pyarrow Dataset/Scanner's do.
   
   My proposal here is to expose Dataset/Scanner python abc's with s.t. rust 
libraries can extend via pyo3+python so higher level tooling (like duckdb for 
example, can query these without having to transfer the whole Table into memory 
first).
   
   In keeping with the same principles as the C data interface, I think it 
would be sufficient for this python interface to be very minimal: Dataset with 
schema and scanner methods, Scanner with projected_schema and to_reader 
methods. The to_reader should return a RecordBatchReader which would then link 
pyo3 datasets into the Arrow C data interface.
   
   Thanks for your consideration!
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to