milenkovicm commented on code in PR #1100:
URL: 
https://github.com/apache/datafusion-ballista/pull/1100#discussion_r1845544416


##########
docs/source/user-guide/python.md:
##########
@@ -28,9 +28,25 @@ popular file formats files, run it in a distributed 
environment, and obtain the
 
 The following code demonstrates how to create a Ballista context and connect 
to a scheduler.
 
+If you are running a standalone cluster (runs locally), all you need to do is 
call the stand alone cluster method `standalone()` or your BallistaContext. If 
you are running a cluster in remote mode, you need to provide the URL 
`Ballista.remote("http://my-remote-ip:50050";)`.
+
 ```text
->>> import ballista
->>> ctx = ballista.BallistaContext("localhost", 50050)
+>>> from ballista import Ballista, BallistaBuilder
+>>> # for a standalone instance
+>>> # Ballista will initiate with an empty config
+>>> # set config variables with `set()`
+>>> ballista = BallistaBuilder()\

Review Comment:
   few issues: 
   
   - there is no `set`
   - there is no `build`
   - there is no `ballista.executor.cpus` & `ballista.shuffle.partitions`



##########
python/ballista/__init__.py:
##########
@@ -25,12 +25,18 @@
 
 import pyarrow as pa
 
-from .pyballista_internal import (
-    SessionContext,
+from .ballista_internal import (
+    BallistaBuilder,
+    #SessionConfig,
+    #SessionStateBuilder,
+    #SessionState
 )
 
 __version__ = importlib_metadata.version(__name__)
 
 __all__ = [
-    "SessionContext",
-]
+    "BallistaBuilder",
+    #"SessionConfig",

Review Comment:
   please remove commented structures



##########
docs/source/user-guide/python.md:
##########
@@ -103,14 +119,15 @@ The `explain` method can be used to show the logical and 
physical query plans fo
 The following example demonstrates creating arrays with PyArrow and then 
creating a Ballista DataFrame.
 
 ```python
-import ballista
+from ballista import Ballista, BallistaBuilder
 import pyarrow
 
 # an alias
+# TODO implement Functions
 f = ballista.functions
 
 # create a context
-ctx = ballista.BallistaContext("localhost", 50050)
+ctx = Ballista.standalone()

Review Comment:
   this line is not correct



##########
python/ballista/tests/test_context.py:
##########
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from pyballista import SessionContext
+from ballista import SessionContext

Review Comment:
   there is no `SessionContext` anymore, is there ? 
   
   ```bash
   python3 -m pytest
   ```
   
   ```
   
====================================================================================================
 ERRORS 
====================================================================================================
   
_______________________________________________________________________________ 
ERROR collecting ballista/tests/test_context.py 
________________________________________________________________________________
   ImportError while importing test module 
'/Users/marko/git/arrow-ballista/python/ballista/tests/test_context.py'.
   Hint: make sure your test modules/packages have valid Python names.
   Traceback:
   
/usr/local/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90:
 in import_module
       return _bootstrap._gcd_import(name[level:], package, level)
   ballista/tests/test_context.py:18: in <module>
       from ballista import SessionContext
   E   ImportError: cannot import name 'SessionContext' from 'ballista' 
(/Users/marko/git/arrow-ballista/python/ballista/__init__.py)
   
===========================================================================================
 short test summary info 
============================================================================================
   E
   ```



##########
docs/source/user-guide/python.md:
##########
@@ -28,9 +28,25 @@ popular file formats files, run it in a distributed 
environment, and obtain the
 
 The following code demonstrates how to create a Ballista context and connect 
to a scheduler.
 
+If you are running a standalone cluster (runs locally), all you need to do is 
call the stand alone cluster method `standalone()` or your BallistaContext. If 
you are running a cluster in remote mode, you need to provide the URL 
`Ballista.remote("http://my-remote-ip:50050";)`.
+
 ```text
->>> import ballista
->>> ctx = ballista.BallistaContext("localhost", 50050)
+>>> from ballista import Ballista, BallistaBuilder

Review Comment:
   there is no `Ballista` anymore



##########
docs/source/user-guide/python.md:
##########
@@ -28,9 +28,25 @@ popular file formats files, run it in a distributed 
environment, and obtain the
 
 The following code demonstrates how to create a Ballista context and connect 
to a scheduler.
 
+If you are running a standalone cluster (runs locally), all you need to do is 
call the stand alone cluster method `standalone()` or your BallistaContext. If 
you are running a cluster in remote mode, you need to provide the URL 
`Ballista.remote("http://my-remote-ip:50050";)`.
+
 ```text
->>> import ballista
->>> ctx = ballista.BallistaContext("localhost", 50050)
+>>> from ballista import Ballista, BallistaBuilder
+>>> # for a standalone instance
+>>> # Ballista will initiate with an empty config
+>>> # set config variables with `set()`
+>>> ballista = BallistaBuilder()\
+>>>    .set("ballista.job.name", "example ballista")\
+>>>    .set("ballista.shuffle.partitions", "16")\
+>>>    .set("ballista.executor.cpus", "4")\
+>>>    .build()
+>>> 
+>>> # Show the Ballista Config
+>>> print(ballista.show_config())

Review Comment:
   i would recommend remove `show_config` method, it is not needed 



##########
docs/source/user-guide/python.md:
##########
@@ -103,14 +119,15 @@ The `explain` method can be used to show the logical and 
physical query plans fo
 The following example demonstrates creating arrays with PyArrow and then 
creating a Ballista DataFrame.
 
 ```python
-import ballista
+from ballista import Ballista, BallistaBuilder

Review Comment:
   remove `Ballista` from examples, its gone 



##########
python/src/lib.rs:
##########
@@ -15,18 +15,107 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use ballista::prelude::*;
+use datafusion::execution::SessionStateBuilder;
+use datafusion::prelude::*;
+use datafusion_python::context::PySessionContext as 
DataFusionPythonSessionContext;
+use datafusion_python::utils::wait_for_future;
+
+use std::collections::HashMap;
+
 use pyo3::prelude::*;
-pub mod context;
 mod utils;
-
-pub use crate::context::PySessionContext;
+use utils::to_pyerr;
 
 #[pymodule]
-fn pyballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
+fn ballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
     pyo3_log::init();
     // Ballista structs
-    m.add_class::<PySessionContext>()?;
+    m.add_class::<PyBallistaBuilder>()?;
     // DataFusion structs
     m.add_class::<datafusion_python::dataframe::PyDataFrame>()?;
+    // Ballista Config
+    /*

Review Comment:
   would suggest removing this comment 



##########
python/ballista/__init__.py:
##########
@@ -25,12 +25,18 @@
 
 import pyarrow as pa
 
-from .pyballista_internal import (
-    SessionContext,
+from .ballista_internal import (
+    BallistaBuilder,
+    #SessionConfig,

Review Comment:
   please remove commented structures



##########
python/README.md:
##########
@@ -29,8 +29,8 @@ part of the default Cargo workspace so that it doesn't cause 
overhead for mainta
 Creates a new context and connects to a Ballista scheduler process.
 
 ```python
-from pyballista import SessionContext
->>> ctx = SessionContext("localhost", 50050)
+from ballista import Ballista

Review Comment:
   not correct anymore 



##########
python/src/lib.rs:
##########
@@ -15,18 +15,107 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use ballista::prelude::*;
+use datafusion::execution::SessionStateBuilder;
+use datafusion::prelude::*;
+use datafusion_python::context::PySessionContext as 
DataFusionPythonSessionContext;

Review Comment:
   is there need to rename `PySessionContext` ?



##########
python/ballista/context.py:
##########


Review Comment:
   what is the purpose of `context.py`? 



##########
python/examples/example.py:
##########
@@ -0,0 +1,33 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from ballista import BallistaBuilder
+from datafusion.context import SessionContext
+
+# Ballista will initiate with an empty config
+# set config variables with `set()`
+ctx: SessionContext = BallistaBuilder()\
+    .config("ballista.job.name", "example ballista")\
+    .config("ballista.shuffle.partitions", "16")\

Review Comment:
   only valid config is `ballista.job.name` you could use 
`datafusion.execution.target_partitions` instead of shuffle parallelism. 
`ballista.executor.cpus` does not exist 



##########
python/src/lib.rs:
##########
@@ -15,18 +15,107 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use ballista::prelude::*;
+use datafusion::execution::SessionStateBuilder;
+use datafusion::prelude::*;
+use datafusion_python::context::PySessionContext as 
DataFusionPythonSessionContext;
+use datafusion_python::utils::wait_for_future;
+
+use std::collections::HashMap;
+
 use pyo3::prelude::*;
-pub mod context;
 mod utils;
-
-pub use crate::context::PySessionContext;
+use utils::to_pyerr;
 
 #[pymodule]
-fn pyballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
+fn ballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
     pyo3_log::init();
     // Ballista structs
-    m.add_class::<PySessionContext>()?;
+    m.add_class::<PyBallistaBuilder>()?;
     // DataFusion structs
     m.add_class::<datafusion_python::dataframe::PyDataFrame>()?;
+    // Ballista Config
+    /*
+    // Future implementation will include more state and config options
+    m.add_class::<PySessionStateBuilder>()?;
+    m.add_class::<PySessionState>()?;
+    m.add_class::<PySessionConfig>()?;
+    */
     Ok(())
 }
+
+// Ballista Builder will take a HasMap/Dict Cionfg
+#[pyclass(name = "BallistaBuilder", module = "ballista", subclass)]
+pub struct PyBallistaBuilder {
+    conf: HashMap<String, String>,
+}
+
+#[pymethods]
+impl PyBallistaBuilder {
+    #[new]
+    pub fn new() -> Self {
+        Self {
+            conf: HashMap::new(),
+        }
+    }
+
+    pub fn config(
+        mut slf: PyRefMut<'_, Self>,
+        k: &str,
+        v: &str,
+        py: Python,
+    ) -> PyResult<PyObject> {
+        slf.conf.insert(k.into(), v.into());
+
+        Ok(slf.into_py(py))
+    }
+
+    pub fn show_config(&self) {

Review Comment:
   I would suggest removing `show_config` 



##########
python/src/lib.rs:
##########
@@ -15,18 +15,107 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use ballista::prelude::*;
+use datafusion::execution::SessionStateBuilder;
+use datafusion::prelude::*;
+use datafusion_python::context::PySessionContext as 
DataFusionPythonSessionContext;
+use datafusion_python::utils::wait_for_future;
+
+use std::collections::HashMap;
+
 use pyo3::prelude::*;
-pub mod context;
 mod utils;
-
-pub use crate::context::PySessionContext;
+use utils::to_pyerr;
 
 #[pymodule]
-fn pyballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
+fn ballista_internal(_py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
     pyo3_log::init();
     // Ballista structs
-    m.add_class::<PySessionContext>()?;
+    m.add_class::<PyBallistaBuilder>()?;
     // DataFusion structs
     m.add_class::<datafusion_python::dataframe::PyDataFrame>()?;
+    // Ballista Config
+    /*
+    // Future implementation will include more state and config options
+    m.add_class::<PySessionStateBuilder>()?;
+    m.add_class::<PySessionState>()?;
+    m.add_class::<PySessionConfig>()?;
+    */
     Ok(())
 }
+
+// Ballista Builder will take a HasMap/Dict Cionfg
+#[pyclass(name = "BallistaBuilder", module = "ballista", subclass)]
+pub struct PyBallistaBuilder {
+    conf: HashMap<String, String>,
+}
+
+#[pymethods]
+impl PyBallistaBuilder {
+    #[new]
+    pub fn new() -> Self {
+        Self {
+            conf: HashMap::new(),
+        }
+    }
+
+    pub fn config(
+        mut slf: PyRefMut<'_, Self>,
+        k: &str,
+        v: &str,
+        py: Python,
+    ) -> PyResult<PyObject> {
+        slf.conf.insert(k.into(), v.into());
+
+        Ok(slf.into_py(py))
+    }
+
+    pub fn show_config(&self) {
+        println!("Ballista Config:");
+        for ele in self.conf.iter() {
+            println!("\t{}: {}", ele.0, ele.1)
+        }
+    }
+
+    /// Construct the standalone instance from the SessionContext
+    pub fn standalone(&self, py: Python) -> 
PyResult<DataFusionPythonSessionContext> {
+        // Build the config
+        let config: SessionConfig = 
SessionConfig::from_string_hash_map(&self.conf)?;
+        // Build the state
+        let state = SessionStateBuilder::new()
+            .with_config(config)
+            .with_default_features()
+            .build();
+        // Build the context
+        let standalone_session = SessionContext::standalone_with_state(state);
+
+        // SessionContext is an async function
+        let ctx = wait_for_future(py, standalone_session)?;
+
+        // Convert the SessionContext into a Python SessionContext
+        Ok(ctx.into())
+    }
+
+    /// Construct the remote instance from the SessionContext
+    pub fn remote(

Review Comment:
   this is  great!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to