gabotechs opened a new issue, #1612:
URL: https://github.com/apache/datafusion-python/issues/1612

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Allow running distributed queries in `datafusion-python`
   
   **Describe the solution you'd like**
   
   Ideally something well integrated with `datafusion-python` that does not 
require big changes or using different APIs for executing distributed queries.
   
   `datafusion-python` is already a very ergonomic wrapper for using 
`datafusion` from within wrapper, so something that maintains that philosophy 
without introducing a lot of API surface would be ideal.
   
   I'm interested specifically in using the `datafusion-distributed` library 
from within Python, and some I see three mutually exclusive ways of integrating 
it:
   - Make `datafusion-python` depend on `datafusion-distributed`, hiding some 
internal plumbing and extending the current API for providing it with 
distributed capabilities.
   - Create an external crate that depends on both `datafusion-distributed` and 
`datafusion-python` that ships an external API for using distributed 
functionality in `datafusion-python`
   - Make `datafusion-distributed` depend on `datafusion-python`, providing a 
set of functions and classes that decorate `datafusion-python` with distributed 
capabilities
   
   I'm not sure which approach aligns best with this project's philosophy, the 
naive intuition from someone unfamiliar with this project is that the first 
option has greater chances of providing a well integrated experience, and it's 
probably the easiest to implement due to the fact that internal plumbing in the 
Rust world can be hidden in this project.
   
   I actually tried this here:
   - https://github.com/apache/datafusion-python/pull/1611
   
   And the fact that with only ~1K LOC, examples and tests included, can yield 
a functional integration, makes me think that it might actually not be a bad 
idea. But again, I don't know what I don't know, so would very gladly accept 
feedback and suggestions on something different.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to