milenkovicm opened a new issue, #1091: URL: https://github.com/apache/datafusion-ballista/issues/1091
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** With changes done in #1088 and introduction with `SessionContextExt` we could make changes in `pyballista` and support datafusion-python context directly instead of `BallistaContext`. This would unify datafusion and ballista python interface enabling users to change from single node deployment to cluster deployment with single line change. **Describe the solution you'd like** I don't think we need to re-invent the wheel here, we just need to copy what <https://github.com/apache/datafusion-ray> is doing and do same for ballista. This PR should provide support for methods provided by `SessionContextExt`. Something similar to: ```python from datafusion.context import SessionContext from pyballista import StandaloneBallista, RemoteBallista ctx : SessionContext = StandaloneBallista() df = ctx.sql("SELECT 1") ``` ## Propose a proper, python, way to initialize `datafuson::PySessionContext` I'm not python expert thus can't really propose ergonomic python interface, so not sure should we use objects like `StandaloneBallista` or static methods like `Ballista.standalone()` although, later would be trivial to make it may not be the most python ergonomics ```python use ballista::prelude::SessionContextExt; use datafusion::prelude::SessionContext; use datafusion_python::{context::PySessionContext, utils::wait_for_future}; #[pymethods] impl Ballista { #[staticmethod] pub fn standalone(py: Python) -> PyResult<PySessionContext> { let session_context = SessionContext::standalone(); let ctx = wait_for_future(py, session_context)?; Ok(ctx.into()) } } ``` It would be great if we align with datafusion-ray approach. ## Make `standalone` optional dependency can we make Ballista `standalone` optional dependency? ```bash pip install pyballista ``` should install remote mode only ```bash pip install pyballista['standalone'] ``` should install remote and standalone, providing easy way to test ballista applications. ## Consider renaming python package As `pyballista` is not published, can we consider renaming package to something like: `datafusion-distributed` or `datafusion-ballista` to align with other packages. To me `datafusion-distributed`, makes sense and can consider renaming `ballista` (client) crate to same name, keeping `ballista-` prefix for executor and scheduler. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Do we need to keep current pyballista implementation or we can remove it with this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org