milenkovicm opened a new issue, #1091:
URL: https://github.com/apache/datafusion-ballista/issues/1091

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   With changes done in #1088 and introduction with `SessionContextExt` we 
could make changes in `pyballista` and support datafusion-python context 
directly instead of `BallistaContext`. This would unify datafusion and ballista
   python interface enabling users to change from single node deployment to 
cluster deployment with single line change.
   
   **Describe the solution you'd like**
   
   I don't think we need to re-invent the wheel here, we just need to copy what 
<https://github.com/apache/datafusion-ray> is doing
   and do same for ballista. This PR should provide support for methods 
provided by `SessionContextExt`.
   
   Something similar to:
   
   ```python
   from datafusion.context import SessionContext
   from pyballista import StandaloneBallista, RemoteBallista
   
   ctx : SessionContext = StandaloneBallista()
   
   df = ctx.sql("SELECT 1")
   ```
   
   ## Propose a proper, python, way to initialize `datafuson::PySessionContext`
   
   I'm not python expert thus can't really propose ergonomic python interface, 
so not sure should we
   use objects like `StandaloneBallista` or static methods like 
`Ballista.standalone()` although,  
   later would be trivial to make it may not be the most python ergonomics
   
   ```python
   use ballista::prelude::SessionContextExt;
   use datafusion::prelude::SessionContext;
   use datafusion_python::{context::PySessionContext, utils::wait_for_future};
   
   #[pymethods]
   impl Ballista {
       #[staticmethod]
       pub fn standalone(py: Python) -> PyResult<PySessionContext> {
           let session_context = SessionContext::standalone();
           let ctx = wait_for_future(py, session_context)?;
           Ok(ctx.into())
       }
   }
   ```
   
   It would be great if we align with datafusion-ray approach.
   
   ## Make `standalone` optional dependency
   
   can we make Ballista `standalone` optional dependency?
   
   ```bash
   pip install pyballista 
   ```
   
   should install remote mode only
   
   ```bash
   pip install pyballista['standalone'] 
   ```
   
   should install remote and standalone, providing easy way to test ballista 
applications.
   
   ## Consider renaming python package
   
   As `pyballista` is not published, can we consider renaming package to 
something like: `datafusion-distributed` or `datafusion-ballista` to align with 
other packages.
   
   To me `datafusion-distributed`, makes sense and can consider renaming 
`ballista` (client) crate to same name, keeping `ballista-` prefix for executor 
and scheduler.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features 
you've considered.
   
   **Additional context**
   
   Do we need to keep current pyballista implementation or we can remove it 
with this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to