[arrow-ballista] branch master updated: Add basic Python docs and enable information_schema in Python context (#170)

agrove Mon, 29 Aug 2022 19:38:13 -0700

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-ballista.git



The following commit(s) were added to refs/heads/master by this push:
     new a465c0db Add basic Python docs and enable information_schema in Python 
context (#170)
a465c0db is described below

commit a465c0dbfd91fd09ee9ac7acf4db091eb0355902
Author: Andy Grove <[email protected]>
AuthorDate: Mon Aug 29 20:38:02 2022 -0600

    Add basic Python docs and enable information_schema in Python context (#170)
---
 docs/source/user-guide/python.md | 72 ++++++++++++++++++++++++++++++++++++----
 python/src/ballista_context.rs   |  1 +
 2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/docs/source/user-guide/python.md b/docs/source/user-guide/python.md
index 0d42c450..3bd4fe50 100644
--- a/docs/source/user-guide/python.md
+++ b/docs/source/user-guide/python.md
@@ -17,16 +17,74 @@
   under the License.
 -->
 
-# Python
+# Ballista Python Bindings
+
+Ballista provides Python bindings, allowing SQL and DataFrame queries to be 
executed from the Python shell.
+
+## Connecting to a Cluster
+
+The following code demonstrates how to create a Ballista context and connect 
to a scheduler.
 
 ```text
 >>> import ballista
 >>> ctx = ballista.BallistaContext("localhost", 50050)
->>> df = ctx.sql("SELECT 1")
+```
+
+## Registering Tables
+
+Tables can be registered against the context by calling one of the `register` 
methods, or by executing SQL.
+
+```text
+>>> ctx.register_parquet("trips", "/mnt/bigdata/nyctaxi")
+```
+
+```text
+>>> ctx.sql("CREATE EXTERNAL TABLE trips STORED AS PARQUET LOCATION 
'/mnt/bigdata/nyctaxi'")
+```
+
+## Executing Queries
+
+The `sql` method creates a `DataFrame`. The query is executed when an action 
such as `show` or `collect` is executed.
+
+### Showing Query Results
+
+```text
+>>> df = ctx.sql("SELECT count(*) FROM trips")
 >>> df.show()
-+----------+
-| Int64(1) |
-+----------+
-| 1        |
-+----------+
++-----------------+
+| COUNT(UInt8(1)) |
++-----------------+
+| 9071244         |
++-----------------+
+```
+
+### Collecting Query Results
+
+The `collect` method executres the query and returns the results in
+[PyArrow](https://arrow.apache.org/docs/python/index.html) record batches.
+
+```text
+>>> df = ctx.sql("SELECT count(*) FROM trips")
+>>> df.collect()
+[pyarrow.RecordBatch
+COUNT(UInt8(1)): int64]
+```
+
+### Viewing Query Plans
+
+The `explain` method can be used to show the logical and physical query plans 
for a query.
+
+```text
+>>> df.explain()
++---------------+-------------------------------------------------------------+
+| plan_type     | plan                                                        |
++---------------+-------------------------------------------------------------+
+| logical_plan  | Projection: #COUNT(UInt8(1))                                |
+|               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]         |
+|               |     TableScan: trips projection=[VendorID]                  |
+| physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))] |
+|               |   ProjectionExec: expr=[9071244 as COUNT(UInt8(1))]         |
+|               |     EmptyExec: produce_one_row=true                         |
+|               |                                                             |
++---------------+-------------------------------------------------------------+
 ```
diff --git a/python/src/ballista_context.rs b/python/src/ballista_context.rs
index 4fd91c62..40e389e7 100644
--- a/python/src/ballista_context.rs
+++ b/python/src/ballista_context.rs
@@ -42,6 +42,7 @@ impl PyBallistaContext {
     fn new(py: Python, host: &str, port: u16) -> PyResult<Self> {
         let config = BallistaConfig::builder()
             .set("ballista.shuffle.partitions", "4")
+            .set("ballista.with_information_schema", "true")
             .build()
             .map_err(BallistaError::from)?;

[arrow-ballista] branch master updated: Add basic Python docs and enable information_schema in Python context (#170)

Reply via email to