milenkovicm opened a new issue, #1081:
URL: https://github.com/apache/datafusion-ballista/issues/1081

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   As we tend to reduce code footprint I would like to propose to replace 
`BallistaContext` with `SessionContext`.
   
   It would definitely improve usability as we would get most of the methods 
available in SessionContext also, some DataFusion applications would be 
deployable to Ballista with single line change.
   
   ```rust
   use ballista::{extension::SessionContextExt, prelude::*};
   use datafusion::prelude::SessionContext;
   
   let ctx : SessionContext = SessionContext::ballista_standalone().await?;
   ```
   
   With write sinks now in place, we will get write support as well, feature 
Ballista did not have before.
   
   IMHO it would make a lot of sense to have a single api across DataFusion and 
Ballista.
   
   If replacement is  successful it would enable us to re-use Datafusion Python 
crate, eliminating need for maintenance
   of Ballista Python, We would need to provide 
`SessionContext::ballista_standalone` and equivalent methods.
   
   ```python
   import datafusion
   import ballista.standalone
   from datafusion import col
   
   # create a context (datafusion context with ballista standalone enabled)
   ctx = ballista.standalone.SessionContext()
   ```
   
   There are clear benefits of deprecation of `BallistaContext`, however 
decision may be problematic as we could not hide `SessionContext`
   methods which do not work with ballista. `SessionContext` may bring 
usability issues with UDF support, configuration and basically all 
functionalities which need to be propagated across the cluster to work, and 
which may not be trivial to address. We may try to be address the by "turning 
off" those methods in ballista or just by documenting it, still some effort is 
needed. Or maybe its not issue at all?
   
   **Describe the solution you'd like**
   
   Rough action plan:
   
   - Create `SessionContextExt` which would expose methods for creating 
`standalone` nad `remote` context, re-using `BallistaQueryPlanner`.
   - Verify basic `SQL` and `DataFrame`  support.
   - Verify/fix write support (plans with write Sink are generated but write 
operation does not create valid files).
   - Update python crate to create SessionContextExt.
   - Deprecate `BallistaContext`.
   - Deprecate ballista context.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   
   relates to #1068
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to