kylebarron opened a new issue, #1227:
URL: https://github.com/apache/datafusion-python/issues/1227

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   PyArrow is a massive dependency. Unpacked, it tends to be >100MB in size, 
and, until the latest versions (I think?) also required numpy as its own 
non-optional dependency.
   
   It's also, in effect the only current dependency
   
https://github.com/apache/datafusion-python/blob/f0bbad7543717c5f08ba2acb92d42c9d30fd2355/pyproject.toml#L46
   
   It would be great if we could remove it, and that would greatly lessen the 
minimal environment size for datafusion python.
   
   [Many other Python Arrow 
libraries](https://github.com/apache/arrow/issues/39195#issuecomment-2245718008)
 implement the PyCapsule Interface, so the user can use nanoarrow, arro3, 
Polars, DuckDB, etc, or pyarrow. Whatever is best for them.
   
   **Describe the solution you'd like**
   
   The Arrow PyCapsule Interface is a lightweight, decentralized protocol for 
sharing Arrow data between Python libraries. We already implement the PyCapsule 
Interface, so it's just a matter of removing places where we hard-code use of 
pyarrow.
   
   **Describe alternatives you've considered**
   
   Keep pyarrow dependency.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to