gabotechs opened a new issue, #1612: URL: https://github.com/apache/datafusion-python/issues/1612
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Allow running distributed queries in `datafusion-python` **Describe the solution you'd like** Ideally something well integrated with `datafusion-python` that does not require big changes or using different APIs for executing distributed queries. `datafusion-python` is already a very ergonomic wrapper for using `datafusion` from within wrapper, so something that maintains that philosophy without introducing a lot of API surface would be ideal. I'm interested specifically in using the `datafusion-distributed` library from within Python, and some I see three mutually exclusive ways of integrating it: - Make `datafusion-python` depend on `datafusion-distributed`, hiding some internal plumbing and extending the current API for providing it with distributed capabilities. - Create an external crate that depends on both `datafusion-distributed` and `datafusion-python` that ships an external API for using distributed functionality in `datafusion-python` - Make `datafusion-distributed` depend on `datafusion-python`, providing a set of functions and classes that decorate `datafusion-python` with distributed capabilities I'm not sure which approach aligns best with this project's philosophy, the naive intuition from someone unfamiliar with this project is that the first option has greater chances of providing a well integrated experience, and it's probably the easiest to implement due to the fact that internal plumbing in the Rust world can be hidden in this project. I actually tried this here: - https://github.com/apache/datafusion-python/pull/1611 And the fact that with only ~1K LOC, examples and tests included, can yield a functional integration, makes me think that it might actually not be a bad idea. But again, I don't know what I don't know, so would very gladly accept feedback and suggestions on something different. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
