This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-ballista.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 74ddb6f2 Publish built docs triggered by 
5ec9ae6b0b5855e7744aa93e6a9acfee25f5ceab
74ddb6f2 is described below

commit 74ddb6f296df6ddc78138bd588ab8260baa3c675
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Sat Jan 17 19:19:17 2026 +0000

    Publish built docs triggered by 5ec9ae6b0b5855e7744aa93e6a9acfee25f5ceab
---
 _sources/contributors-guide/architecture.md.txt    |  3 ---
 _sources/contributors-guide/development.md.txt     | 11 +++++++++++
 _sources/user-guide/cli.md.txt                     |  2 +-
 _sources/user-guide/configs.md.txt                 | 17 ++++++++---------
 .../user-guide/deployment/docker-compose.md.txt    | 10 +++-------
 _sources/user-guide/deployment/docker.md.txt       | 21 +++++++++------------
 _sources/user-guide/deployment/kubernetes.md.txt   | 22 +++++++++++-----------
 _sources/user-guide/python.md.txt                  | 11 ++++-------
 _sources/user-guide/tuning-guide.md.txt            |  4 ++--
 contributors-guide/architecture.html               |  2 --
 contributors-guide/development.html                | 16 ++++++++++++++++
 searchindex.js                                     |  2 +-
 user-guide/cli.html                                |  2 +-
 user-guide/configs.html                            |  7 +++----
 user-guide/deployment/docker-compose.html          | 10 +++-------
 user-guide/deployment/docker.html                  | 21 +++++++++------------
 user-guide/deployment/kubernetes.html              | 22 +++++++++++-----------
 user-guide/python.html                             | 11 ++++-------
 user-guide/tuning-guide.html                       |  4 ++--
 19 files changed, 99 insertions(+), 99 deletions(-)

diff --git a/_sources/contributors-guide/architecture.md.txt 
b/_sources/contributors-guide/architecture.md.txt
index b0e00730..b25321d0 100644
--- a/_sources/contributors-guide/architecture.md.txt
+++ b/_sources/contributors-guide/architecture.md.txt
@@ -80,9 +80,6 @@ plan or a SQL string. The scheduler then creates an execution 
graph, which conta
 stages (pipelines) that can be scheduled independently. This process is 
explained in detail in the Distributed
 Query Scheduling section of this guide.
 
-It is possible to have multiple schedulers running with shared state in etcd, 
so that jobs can continue to run
-even if a scheduler process fails.
-
 ### Executor
 
 The executor processes connect to a scheduler and poll for tasks to perform. 
These tasks are physical plans in
diff --git a/_sources/contributors-guide/development.md.txt 
b/_sources/contributors-guide/development.md.txt
index a21595b1..feefc8a6 100644
--- a/_sources/contributors-guide/development.md.txt
+++ b/_sources/contributors-guide/development.md.txt
@@ -65,3 +65,14 @@ cargo test
 cd examples
 cargo run --example standalone_sql --features=ballista/standalone
 ```
+
+## Benchmarking
+
+For performance testing and benchmarking with TPC-H and other datasets, see 
the [benchmarks README](../../../benchmarks/README.md).
+
+This includes instructions for:
+
+- Generating TPC-H test data
+- Running benchmarks against DataFusion and Ballista
+- Comparing performance with Apache Spark
+- Running load tests
diff --git a/_sources/user-guide/cli.md.txt b/_sources/user-guide/cli.md.txt
index 213f6034..597bc195 100644
--- a/_sources/user-guide/cli.md.txt
+++ b/_sources/user-guide/cli.md.txt
@@ -71,7 +71,7 @@ It is also possible to run the CLI in standalone mode, where 
it will create a sc
 ```bash
 $ ballista-cli
 
-Ballista CLI v8.0.0
+Ballista CLI v51.0.0
 
 > CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
 0 rows in set. Query took 0.001 seconds.
diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 5e909e15..56b847e1 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -96,14 +96,13 @@ manage the whole cluster are also needed to be taken care 
of.
 _Example: Specifying configuration options when starting the scheduler_
 
 ```shell
-./ballista-scheduler --scheduler-policy push-staged --event-loop-buffer-size 
1000000 --executor-slots-policy
-round-robin-local
+./ballista-scheduler --scheduler-policy push-staged --event-loop-buffer-size 
1000000 --task-distribution round-robin
 ```
 
-| key                                          | type   | default     | 
description                                                                     
                                                                                
                |
-| -------------------------------------------- | ------ | ----------- | 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
-| scheduler-policy                             | Utf8   | pull-staged | Sets 
the task scheduling policy for the scheduler, possible values: pull-staged, 
push-staged.                                                                    
               |
-| event-loop-buffer-size                       | UInt32 | 10000       | Sets 
the event loop buffer size. for a system of high throughput, a larger value 
like 1000000 is recommended.                                                    
               |
-| executor-slots-policy                        | Utf8   | bias        | Sets 
the executor slots policy for the scheduler, possible values: bias, 
round-robin, round-robin-local. For a cluster with single scheduler, 
round-robin-local is recommended. |
-| finished-job-data-clean-up-interval-seconds  | UInt64 | 300         | Sets 
the delayed interval for cleaning up finished job data, mainly the shuffle 
data, 0 means the cleaning up is disabled.                                      
                |
-| finished-job-state-clean-up-interval-seconds | UInt64 | 3600        | Sets 
the delayed interval for cleaning up finished job state stored in the backend, 
0 means the cleaning up is disabled.                                            
            |
+| key                                          | type   | default     | 
description                                                                     
                                           |
+| -------------------------------------------- | ------ | ----------- | 
--------------------------------------------------------------------------------------------------------------------------
 |
+| scheduler-policy                             | Utf8   | pull-staged | Sets 
the task scheduling policy for the scheduler, possible values: pull-staged, 
push-staged.                              |
+| event-loop-buffer-size                       | UInt32 | 10000       | Sets 
the event loop buffer size. for a system of high throughput, a larger value 
like 1000000 is recommended.              |
+| task-distribution                            | Utf8   | bias        | Sets 
the task distribution policy for the scheduler, possible values: bias, 
round-robin, consistent-hash.                  |
+| finished-job-data-clean-up-interval-seconds  | UInt64 | 300         | Sets 
the delayed interval for cleaning up finished job data, mainly the shuffle 
data, 0 means the cleaning up is disabled. |
+| finished-job-state-clean-up-interval-seconds | UInt64 | 3600        | Sets 
the delayed interval for cleaning up finished job state stored in the backend, 
0 means the cleaning up is disabled.   |
diff --git a/_sources/user-guide/deployment/docker-compose.md.txt 
b/_sources/user-guide/deployment/docker-compose.md.txt
index 67f40b7a..96f69d69 100644
--- a/_sources/user-guide/deployment/docker-compose.md.txt
+++ b/_sources/user-guide/deployment/docker-compose.md.txt
@@ -39,15 +39,11 @@ This should show output similar to the following:
 ```bash
 $ docker-compose up
 Creating network "ballista-benchmarks_default" with the default driver
-Creating ballista-benchmarks_etcd_1 ... done
 Creating ballista-benchmarks_ballista-scheduler_1 ... done
 Creating ballista-benchmarks_ballista-executor_1  ... done
-Attaching to ballista-benchmarks_etcd_1, 
ballista-benchmarks_ballista-scheduler_1, 
ballista-benchmarks_ballista-executor_1
-ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] Running 
with config:
-ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] 
work_dir: /tmp/.tmpLVx39c
-ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] 
concurrent_tasks: 4
-ballista-scheduler_1  | [2021-08-28T15:55:22Z INFO  ballista_scheduler] 
Ballista v0.12.0 Scheduler listening on 0.0.0.0:50050
-ballista-executor_1   | [2021-08-28T15:55:22Z INFO  ballista_executor] 
Ballista v0.12.0 Rust Executor listening on 0.0.0.0:50051
+Attaching to ballista-benchmarks_ballista-scheduler_1, 
ballista-benchmarks_ballista-executor_1
+ballista-scheduler_1  | INFO ballista_scheduler: Ballista v51.0.0 Scheduler 
listening on 0.0.0.0:50050
+ballista-executor_1   | INFO ballista_executor: Ballista v51.0.0 Rust Executor 
listening on 0.0.0.0:50051
 ```
 
 The scheduler listens on port 50050 and this is the port that clients will 
need to connect to.
diff --git a/_sources/user-guide/deployment/docker.md.txt 
b/_sources/user-guide/deployment/docker.md.txt
index a0542377..cf5488a6 100644
--- a/_sources/user-guide/deployment/docker.md.txt
+++ b/_sources/user-guide/deployment/docker.md.txt
@@ -67,13 +67,10 @@ Run `docker logs CONTAINER_ID` to check the output from the 
process:
 
 ```
 $ docker logs a756055576f3
-2024-02-03T14:49:47.904571Z  INFO main ThreadId(01) 
ballista_scheduler::cluster: Initializing Sled database in temp directory
-
-2024-02-03T14:49:47.924679Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_process: Ballista v0.12.0 Scheduler listening on 
0.0.0.0:50050
-2024-02-03T14:49:47.924709Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task 
scheduling policy of PullStaged
-2024-02-03T14:49:47.925261Z  INFO main ThreadId(01) 
ballista_scheduler::cluster::kv: Initializing heartbeat listener
-2024-02-03T14:49:47.925476Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_server::query_stage_scheduler: Starting 
QueryStageScheduler
-2024-02-03T14:49:47.925587Z  INFO tokio-runtime-worker ThreadId(47) 
ballista_core::event_loop: Starting the event loop query_stage
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler 
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_process: Starting Scheduler grpc server 
with task scheduling policy of PullStaged
+INFO ballista_scheduler::scheduler_server::query_stage_scheduler: Starting 
QueryStageScheduler
+INFO ballista_core::event_loop: Starting the event loop query_stage
 ```
 
 ### Start Executors
@@ -99,11 +96,11 @@ Use `docker logs CONTAINER_ID` to check the output from the 
executor(s):
 
 ```
 $ docker logs fb8b530cee6d
-2024-02-03T14:50:24.061607Z  INFO main ThreadId(01) 
ballista_executor::executor_process: Running with config:
-2024-02-03T14:50:24.061649Z  INFO main ThreadId(01) 
ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
-2024-02-03T14:50:24.061655Z  INFO main ThreadId(01) 
ballista_executor::executor_process: concurrent_tasks: 48
-2024-02-03T14:50:24.063256Z  INFO tokio-runtime-worker ThreadId(44) 
ballista_executor::executor_process: Ballista v0.12.0 Rust Executor Flight 
Server listening on 0.0.0.0:50051
-2024-02-03T14:50:24.063281Z  INFO tokio-runtime-worker ThreadId(47) 
ballista_executor::execution_loop: Starting poll work loop with scheduler
+INFO ballista_executor::executor_process: Running with config:
+INFO ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
+INFO ballista_executor::executor_process: concurrent_tasks: 48
+INFO ballista_executor::executor_process: Ballista v51.0.0 Rust Executor 
Flight Server listening on 0.0.0.0:50051
+INFO ballista_executor::execution_loop: Starting poll work loop with scheduler
 ```
 
 ## Connect from the CLI
diff --git a/_sources/user-guide/deployment/kubernetes.md.txt 
b/_sources/user-guide/deployment/kubernetes.md.txt
index d3062ed8..3e25d1f9 100644
--- a/_sources/user-guide/deployment/kubernetes.md.txt
+++ b/_sources/user-guide/deployment/kubernetes.md.txt
@@ -48,10 +48,10 @@ To create the required Docker images please refer to the 
[docker deployment page
 Once the images have been built, you can retag them and can push them to your 
favourite Docker registry.
 
 ```bash
-docker tag apache/datafusion-ballista-scheduler:0.12.0 
<your-repo>/datafusion-ballista-scheduler:0.12.0
-docker tag apache/datafusion-ballista-executor:0.12.0 
<your-repo>/datafusion-ballista-executor:0.12.0
-docker push <your-repo>/datafusion-ballista-scheduler:0.12.0
-docker push <your-repo>/datafusion-ballista-executor:0.12.0
+docker tag apache/datafusion-ballista-scheduler:latest 
<your-repo>/datafusion-ballista-scheduler:latest
+docker tag apache/datafusion-ballista-executor:latest 
<your-repo>/datafusion-ballista-executor:latest
+docker push <your-repo>/datafusion-ballista-scheduler:latest
+docker push <your-repo>/datafusion-ballista-executor:latest
 ```
 
 ## Create Persistent Volume and Persistent Volume Claim
@@ -139,7 +139,7 @@ spec:
     spec:
       containers:
         - name: ballista-scheduler
-          image: <your-repo>/datafusion-ballista-scheduler:0.12.0
+          image: <your-repo>/datafusion-ballista-scheduler:latest
           args: ["--bind-port=50050"]
           ports:
             - containerPort: 50050
@@ -169,7 +169,7 @@ spec:
     spec:
       containers:
         - name: ballista-executor
-          image: <your-repo>/datafusion-ballista-executor:0.12.0
+          image: <your-repo>/datafusion-ballista-executor:latest
           args:
             - "--bind-port=50051"
             - "--scheduler-host=ballista-scheduler"
@@ -208,13 +208,13 @@ ballista-executor-78cc5b6486-7crdm   0/1     Pending   0  
        42s
 ballista-scheduler-879f874c5-rnbd6   0/1     Pending   0          42s
 ```
 
-You can view the scheduler logs with `kubectl logs ballista-scheduler-0`:
+You can view the scheduler logs with `kubectl logs 
ballista-scheduler-<pod-id>`:
 
 ```
-$ kubectl logs ballista-scheduler-0
-[2021-02-19T00:24:01Z INFO  scheduler] Ballista v0.7.0 Scheduler listening on 
0.0.0.0:50050
-[2021-02-19T00:24:16Z INFO  ballista::scheduler] Received register_executor 
request for ExecutorMetadata { id: "b5e81711-1c5c-46ec-8522-d8b359793188", 
host: "10.1.23.149", port: 50051 }
-[2021-02-19T00:24:17Z INFO  ballista::scheduler] Received register_executor 
request for ExecutorMetadata { id: "816e4502-a876-4ed8-b33f-86d243dcf63f", 
host: "10.1.23.150", port: 50051 }
+$ kubectl logs ballista-scheduler-<pod-id>
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler 
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor 
request for ExecutorMetadata { id: "b5e81711-1c5c-46ec-8522-d8b359793188", 
host: "10.1.23.149", port: 50051 }
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor 
request for ExecutorMetadata { id: "816e4502-a876-4ed8-b33f-86d243dcf63f", 
host: "10.1.23.150", port: 50051 }
 ```
 
 ## Port Forwarding
diff --git a/_sources/user-guide/python.md.txt 
b/_sources/user-guide/python.md.txt
index f17ac68d..e7d13968 100644
--- a/_sources/user-guide/python.md.txt
+++ b/_sources/user-guide/python.md.txt
@@ -117,12 +117,8 @@ The following example demonstrates creating arrays with 
PyArrow and then creatin
 from ballista import BallistaBuilder
 import pyarrow
 
-# an alias
-# TODO implement Functions
-f = ballista.functions
-
 # create a context
-ctx = Ballista().standalone()
+ctx = BallistaBuilder().standalone()
 
 # create a RecordBatch and a new DataFrame from it
 batch = pyarrow.RecordBatch.from_arrays(
@@ -132,9 +128,10 @@ batch = pyarrow.RecordBatch.from_arrays(
 df = ctx.create_dataframe([[batch]])
 
 # create a new statement
+from datafusion import col
 df = df.select(
-    f.col("a") + f.col("b"),
-    f.col("a") - f.col("b"),
+    col("a") + col("b"),
+    col("a") - col("b"),
 )
 
 # execute and collect the first (and only) batch
diff --git a/_sources/user-guide/tuning-guide.md.txt 
b/_sources/user-guide/tuning-guide.md.txt
index 22955b44..fe4363df 100644
--- a/_sources/user-guide/tuning-guide.md.txt
+++ b/_sources/user-guide/tuning-guide.md.txt
@@ -73,8 +73,8 @@ which is the best for your use case.
 
 Pull-based scheduling works in a similar way to Apache Spark and push-based 
scheduling can result in lower latency.
 
-The scheduling policy can be specified in the `--scheduler_policy` parameter 
when starting the scheduler and executor
-processes. The default is `pull-based`.
+The scheduling policy can be specified in the `--scheduler-policy` parameter 
when starting the scheduler and executor
+processes. The default is `pull-staged`.
 
 ## Viewing Query Plans and Metrics
 
diff --git a/contributors-guide/architecture.html 
b/contributors-guide/architecture.html
index 367a7788..d21c1d12 100644
--- a/contributors-guide/architecture.html
+++ b/contributors-guide/architecture.html
@@ -430,8 +430,6 @@ between the executor(s) and the scheduler for fetching 
tasks and reporting task
 plan or a SQL string. The scheduler then creates an execution graph, which 
contains a physical plan broken down into
 stages (pipelines) that can be scheduled independently. This process is 
explained in detail in the Distributed
 Query Scheduling section of this guide.</p>
-<p>It is possible to have multiple schedulers running with shared state in 
etcd, so that jobs can continue to run
-even if a scheduler process fails.</p>
 </section>
 <section id="executor">
 <h3>Executor<a class="headerlink" href="#executor" title="Link to this 
heading">¶</a></h3>
diff --git a/contributors-guide/development.html 
b/contributors-guide/development.html
index 476188da..364f1825 100644
--- a/contributors-guide/development.html
+++ b/contributors-guide/development.html
@@ -275,6 +275,11 @@
    Running the examples
   </a>
  </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#benchmarking">
+   Benchmarking
+  </a>
+ </li>
 </ul>
 
 </nav>
@@ -366,6 +371,17 @@ cargo<span class="w"> </span>run<span class="w"> 
</span>--example<span class="w"
 </pre></div>
 </div>
 </section>
+<section id="benchmarking">
+<h2>Benchmarking<a class="headerlink" href="#benchmarking" title="Link to this 
heading">¶</a></h2>
+<p>For performance testing and benchmarking with TPC-H and other datasets, see 
the <span class="xref myst">benchmarks README</span>.</p>
+<p>This includes instructions for:</p>
+<ul class="simple">
+<li><p>Generating TPC-H test data</p></li>
+<li><p>Running benchmarks against DataFusion and Ballista</p></li>
+<li><p>Comparing performance with Apache Spark</p></li>
+<li><p>Running load tests</p></li>
+</ul>
+</section>
 </section>
 
 
diff --git a/searchindex.js b/searchindex.js
index 04b10380..325691b0 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]], 
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11, 
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code 
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]], 
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]], 
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[17, null]], 
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]], 
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11, 
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code 
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]], 
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]], 
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[17, null]], 
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
diff --git a/user-guide/cli.html b/user-guide/cli.html
index e25381fd..5227a93f 100644
--- a/user-guide/cli.html
+++ b/user-guide/cli.html
@@ -373,7 +373,7 @@ process.</p>
 <p>It is also possible to run the CLI in standalone mode, where it will create 
a scheduler and executor in-process.</p>
 <div class="highlight-bash notranslate"><div 
class="highlight"><pre><span></span>$<span class="w"> </span>ballista-cli
 
-Ballista<span class="w"> </span>CLI<span class="w"> </span>v8.0.0
+Ballista<span class="w"> </span>CLI<span class="w"> </span>v51.0.0
 
 &gt;<span class="w"> </span>CREATE<span class="w"> </span>EXTERNAL<span 
class="w"> </span>TABLE<span class="w"> </span>foo<span class="w"> </span><span 
class="o">(</span>a<span class="w"> </span>INT,<span class="w"> </span>b<span 
class="w"> </span>INT<span class="o">)</span><span class="w"> 
</span>STORED<span class="w"> </span>AS<span class="w"> </span>CSV<span 
class="w"> </span>LOCATION<span class="w"> </span><span 
class="s1">&#39;data.csv&#39;</span><span class="p">;</span>
 <span class="m">0</span><span class="w"> </span>rows<span class="w"> 
</span><span class="k">in</span><span class="w"> </span>set.<span class="w"> 
</span>Query<span class="w"> </span>took<span class="w"> </span><span 
class="m">0</span>.001<span class="w"> </span>seconds.
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 4671f5fb..a888eefe 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -378,8 +378,7 @@
 <p>Besides the BallistaContext configuration settings, a few configuration 
settings for the Ballista scheduler to better
 manage the whole cluster are also needed to be taken care of.</p>
 <p><em>Example: Specifying configuration options when starting the 
scheduler</em></p>
-<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>./ballista-scheduler<span class="w"> 
</span>--scheduler-policy<span class="w"> </span>push-staged<span class="w"> 
</span>--event-loop-buffer-size<span class="w"> </span><span 
class="m">1000000</span><span class="w"> </span>--executor-slots-policy
-round-robin-local
+<div class="highlight-shell notranslate"><div 
class="highlight"><pre><span></span>./ballista-scheduler<span class="w"> 
</span>--scheduler-policy<span class="w"> </span>push-staged<span class="w"> 
</span>--event-loop-buffer-size<span class="w"> </span><span 
class="m">1000000</span><span class="w"> </span>--task-distribution<span 
class="w"> </span>round-robin
 </pre></div>
 </div>
 <table class="table">
@@ -401,10 +400,10 @@ round-robin-local
 <td><p>10000</p></td>
 <td><p>Sets the event loop buffer size. for a system of high throughput, a 
larger value like 1000000 is recommended.</p></td>
 </tr>
-<tr class="row-even"><td><p>executor-slots-policy</p></td>
+<tr class="row-even"><td><p>task-distribution</p></td>
 <td><p>Utf8</p></td>
 <td><p>bias</p></td>
-<td><p>Sets the executor slots policy for the scheduler, possible values: 
bias, round-robin, round-robin-local. For a cluster with single scheduler, 
round-robin-local is recommended.</p></td>
+<td><p>Sets the task distribution policy for the scheduler, possible values: 
bias, round-robin, consistent-hash.</p></td>
 </tr>
 <tr class="row-odd"><td><p>finished-job-data-clean-up-interval-seconds</p></td>
 <td><p>UInt64</p></td>
diff --git a/user-guide/deployment/docker-compose.html 
b/user-guide/deployment/docker-compose.html
index c241d8fd..0afcc837 100644
--- a/user-guide/deployment/docker-compose.html
+++ b/user-guide/deployment/docker-compose.html
@@ -333,15 +333,11 @@ source repository, run the following command to start a 
cluster:</p>
 <p>This should show output similar to the following:</p>
 <div class="highlight-bash notranslate"><div 
class="highlight"><pre><span></span>$<span class="w"> 
</span>docker-compose<span class="w"> </span>up
 Creating<span class="w"> </span>network<span class="w"> </span><span 
class="s2">&quot;ballista-benchmarks_default&quot;</span><span class="w"> 
</span>with<span class="w"> </span>the<span class="w"> </span>default<span 
class="w"> </span>driver
-Creating<span class="w"> </span>ballista-benchmarks_etcd_1<span class="w"> 
</span>...<span class="w"> </span><span class="k">done</span>
 Creating<span class="w"> </span>ballista-benchmarks_ballista-scheduler_1<span 
class="w"> </span>...<span class="w"> </span><span class="k">done</span>
 Creating<span class="w"> </span>ballista-benchmarks_ballista-executor_1<span 
class="w">  </span>...<span class="w"> </span><span class="k">done</span>
-Attaching<span class="w"> </span>to<span class="w"> 
</span>ballista-benchmarks_etcd_1,<span class="w"> 
</span>ballista-benchmarks_ballista-scheduler_1,<span class="w"> 
</span>ballista-benchmarks_ballista-executor_1
-ballista-executor_1<span class="w">   </span><span class="p">|</span><span 
class="w"> </span><span class="o">[</span><span 
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span 
class="w">  </span>ballista_executor<span class="o">]</span><span class="w"> 
</span>Running<span class="w"> </span>with<span class="w"> </span>config:
-ballista-executor_1<span class="w">   </span><span class="p">|</span><span 
class="w"> </span><span class="o">[</span><span 
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span 
class="w">  </span>ballista_executor<span class="o">]</span><span class="w"> 
</span>work_dir:<span class="w"> </span>/tmp/.tmpLVx39c
-ballista-executor_1<span class="w">   </span><span class="p">|</span><span 
class="w"> </span><span class="o">[</span><span 
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span 
class="w">  </span>ballista_executor<span class="o">]</span><span class="w"> 
</span>concurrent_tasks:<span class="w"> </span><span class="m">4</span>
-ballista-scheduler_1<span class="w">  </span><span class="p">|</span><span 
class="w"> </span><span class="o">[</span><span 
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span 
class="w">  </span>ballista_scheduler<span class="o">]</span><span class="w"> 
</span>Ballista<span class="w"> </span>v0.12.0<span class="w"> 
</span>Scheduler<span class="w"> </span>listening<span class="w"> 
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50050
-ballista-executor_1<span class="w">   </span><span class="p">|</span><span 
class="w"> </span><span class="o">[</span><span 
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span 
class="w">  </span>ballista_executor<span class="o">]</span><span class="w"> 
</span>Ballista<span class="w"> </span>v0.12.0<span class="w"> </span>Rust<span 
class="w"> </span>Executor<span class="w"> </span>listening<span class="w"> 
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50051
+Attaching<span class="w"> </span>to<span class="w"> 
</span>ballista-benchmarks_ballista-scheduler_1,<span class="w"> 
</span>ballista-benchmarks_ballista-executor_1
+ballista-scheduler_1<span class="w">  </span><span class="p">|</span><span 
class="w"> </span>INFO<span class="w"> </span>ballista_scheduler:<span 
class="w"> </span>Ballista<span class="w"> </span>v51.0.0<span class="w"> 
</span>Scheduler<span class="w"> </span>listening<span class="w"> 
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50050
+ballista-executor_1<span class="w">   </span><span class="p">|</span><span 
class="w"> </span>INFO<span class="w"> </span>ballista_executor:<span 
class="w"> </span>Ballista<span class="w"> </span>v51.0.0<span class="w"> 
</span>Rust<span class="w"> </span>Executor<span class="w"> 
</span>listening<span class="w"> </span>on<span class="w"> </span><span 
class="m">0</span>.0.0.0:50051
 </pre></div>
 </div>
 <p>The scheduler listens on port 50050 and this is the port that clients will 
need to connect to.</p>
diff --git a/user-guide/deployment/docker.html 
b/user-guide/deployment/docker.html
index 17192b13..c71f26af 100644
--- a/user-guide/deployment/docker.html
+++ b/user-guide/deployment/docker.html
@@ -369,13 +369,10 @@ a756055576f3   
apache/datafusion-ballista-scheduler:latest   &quot;/root/schedul
 </div>
 <p>Run <code class="docutils literal notranslate"><span 
class="pre">docker</span> <span class="pre">logs</span> <span 
class="pre">CONTAINER_ID</span></code> to check the output from the process:</p>
 <div class="highlight-default notranslate"><div 
class="highlight"><pre><span></span>$ docker logs a756055576f3
-2024-02-03T14:49:47.904571Z  INFO main ThreadId(01) 
ballista_scheduler::cluster: Initializing Sled database in temp directory
-
-2024-02-03T14:49:47.924679Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_process: Ballista v0.12.0 Scheduler listening on 
0.0.0.0:50050
-2024-02-03T14:49:47.924709Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task 
scheduling policy of PullStaged
-2024-02-03T14:49:47.925261Z  INFO main ThreadId(01) 
ballista_scheduler::cluster::kv: Initializing heartbeat listener
-2024-02-03T14:49:47.925476Z  INFO main ThreadId(01) 
ballista_scheduler::scheduler_server::query_stage_scheduler: Starting 
QueryStageScheduler
-2024-02-03T14:49:47.925587Z  INFO tokio-runtime-worker ThreadId(47) 
ballista_core::event_loop: Starting the event loop query_stage
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler 
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_process: Starting Scheduler grpc server 
with task scheduling policy of PullStaged
+INFO ballista_scheduler::scheduler_server::query_stage_scheduler: Starting 
QueryStageScheduler
+INFO ballista_core::event_loop: Starting the event loop query_stage
 </pre></div>
 </div>
 </section>
@@ -396,11 +393,11 @@ a756055576f3   
apache/datafusion-ballista-scheduler:latest   &quot;/root/schedul
 </div>
 <p>Use <code class="docutils literal notranslate"><span 
class="pre">docker</span> <span class="pre">logs</span> <span 
class="pre">CONTAINER_ID</span></code> to check the output from the 
executor(s):</p>
 <div class="highlight-default notranslate"><div 
class="highlight"><pre><span></span>$ docker logs fb8b530cee6d
-2024-02-03T14:50:24.061607Z  INFO main ThreadId(01) 
ballista_executor::executor_process: Running with config:
-2024-02-03T14:50:24.061649Z  INFO main ThreadId(01) 
ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
-2024-02-03T14:50:24.061655Z  INFO main ThreadId(01) 
ballista_executor::executor_process: concurrent_tasks: 48
-2024-02-03T14:50:24.063256Z  INFO tokio-runtime-worker ThreadId(44) 
ballista_executor::executor_process: Ballista v0.12.0 Rust Executor Flight 
Server listening on 0.0.0.0:50051
-2024-02-03T14:50:24.063281Z  INFO tokio-runtime-worker ThreadId(47) 
ballista_executor::execution_loop: Starting poll work loop with scheduler
+INFO ballista_executor::executor_process: Running with config:
+INFO ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
+INFO ballista_executor::executor_process: concurrent_tasks: 48
+INFO ballista_executor::executor_process: Ballista v51.0.0 Rust Executor 
Flight Server listening on 0.0.0.0:50051
+INFO ballista_executor::execution_loop: Starting poll work loop with scheduler
 </pre></div>
 </div>
 </section>
diff --git a/user-guide/deployment/kubernetes.html 
b/user-guide/deployment/kubernetes.html
index a52e78c8..11808791 100644
--- a/user-guide/deployment/kubernetes.html
+++ b/user-guide/deployment/kubernetes.html
@@ -368,10 +368,10 @@ must be enabled using the following command.</p>
 <section id="publishing-docker-images">
 <h2>Publishing Docker Images<a class="headerlink" 
href="#publishing-docker-images" title="Link to this heading">¶</a></h2>
 <p>Once the images have been built, you can retag them and can push them to 
your favourite Docker registry.</p>
-<div class="highlight-bash notranslate"><div 
class="highlight"><pre><span></span>docker<span class="w"> </span>tag<span 
class="w"> </span>apache/datafusion-ballista-scheduler:0.12.0<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-scheduler:0.12.0
-docker<span class="w"> </span>tag<span class="w"> 
</span>apache/datafusion-ballista-executor:0.12.0<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-executor:0.12.0
-docker<span class="w"> </span>push<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-scheduler:0.12.0
-docker<span class="w"> </span>push<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-executor:0.12.0
+<div class="highlight-bash notranslate"><div 
class="highlight"><pre><span></span>docker<span class="w"> </span>tag<span 
class="w"> </span>apache/datafusion-ballista-scheduler:latest<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-scheduler:latest
+docker<span class="w"> </span>tag<span class="w"> 
</span>apache/datafusion-ballista-executor:latest<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-executor:latest
+docker<span class="w"> </span>push<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-scheduler:latest
+docker<span class="w"> </span>push<span class="w"> 
</span>&lt;your-repo&gt;/datafusion-ballista-executor:latest
 </pre></div>
 </div>
 </section>
@@ -453,7 +453,7 @@ persistentvolumeclaim/data-pv-claim<span class="w"> 
</span>created
 <span class="w">    </span><span class="nt">spec</span><span class="p">:</span>
 <span class="w">      </span><span class="nt">containers</span><span 
class="p">:</span>
 <span class="w">        </span><span class="p p-Indicator">-</span><span 
class="w"> </span><span class="nt">name</span><span class="p">:</span><span 
class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">ballista-scheduler</span>
-<span class="w">          </span><span class="nt">image</span><span 
class="p">:</span><span class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">&lt;your-repo&gt;/datafusion-ballista-scheduler:0.12.0</span>
+<span class="w">          </span><span class="nt">image</span><span 
class="p">:</span><span class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">&lt;your-repo&gt;/datafusion-ballista-scheduler:latest</span>
 <span class="w">          </span><span class="nt">args</span><span 
class="p">:</span><span class="w"> </span><span class="p 
p-Indicator">[</span><span class="s">&quot;--bind-port=50050&quot;</span><span 
class="p p-Indicator">]</span>
 <span class="w">          </span><span class="nt">ports</span><span 
class="p">:</span>
 <span class="w">            </span><span class="p p-Indicator">-</span><span 
class="w"> </span><span class="nt">containerPort</span><span 
class="p">:</span><span class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">50050</span>
@@ -483,7 +483,7 @@ persistentvolumeclaim/data-pv-claim<span class="w"> 
</span>created
 <span class="w">    </span><span class="nt">spec</span><span class="p">:</span>
 <span class="w">      </span><span class="nt">containers</span><span 
class="p">:</span>
 <span class="w">        </span><span class="p p-Indicator">-</span><span 
class="w"> </span><span class="nt">name</span><span class="p">:</span><span 
class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">ballista-executor</span>
-<span class="w">          </span><span class="nt">image</span><span 
class="p">:</span><span class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">&lt;your-repo&gt;/datafusion-ballista-executor:0.12.0</span>
+<span class="w">          </span><span class="nt">image</span><span 
class="p">:</span><span class="w"> </span><span class="l l-Scalar 
l-Scalar-Plain">&lt;your-repo&gt;/datafusion-ballista-executor:latest</span>
 <span class="w">          </span><span class="nt">args</span><span 
class="p">:</span>
 <span class="w">            </span><span class="p p-Indicator">-</span><span 
class="w"> </span><span class="s">&quot;--bind-port=50051&quot;</span>
 <span class="w">            </span><span class="p p-Indicator">-</span><span 
class="w"> </span><span 
class="s">&quot;--scheduler-host=ballista-scheduler&quot;</span>
@@ -517,11 +517,11 @@ ballista-executor-78cc5b6486-7crdm<span class="w">   
</span><span class="m">0</s
 ballista-scheduler-879f874c5-rnbd6<span class="w">   </span><span 
class="m">0</span>/1<span class="w">     </span>Pending<span class="w">   
</span><span class="m">0</span><span class="w">          </span>42s
 </pre></div>
 </div>
-<p>You can view the scheduler logs with <code class="docutils literal 
notranslate"><span class="pre">kubectl</span> <span class="pre">logs</span> 
<span class="pre">ballista-scheduler-0</span></code>:</p>
-<div class="highlight-default notranslate"><div 
class="highlight"><pre><span></span>$ kubectl logs ballista-scheduler-0
-[2021-02-19T00:24:01Z INFO  scheduler] Ballista v0.7.0 Scheduler listening on 
0.0.0.0:50050
-[2021-02-19T00:24:16Z INFO  ballista::scheduler] Received register_executor 
request for ExecutorMetadata { id: 
&quot;b5e81711-1c5c-46ec-8522-d8b359793188&quot;, host: 
&quot;10.1.23.149&quot;, port: 50051 }
-[2021-02-19T00:24:17Z INFO  ballista::scheduler] Received register_executor 
request for ExecutorMetadata { id: 
&quot;816e4502-a876-4ed8-b33f-86d243dcf63f&quot;, host: 
&quot;10.1.23.150&quot;, port: 50051 }
+<p>You can view the scheduler logs with <code class="docutils literal 
notranslate"><span class="pre">kubectl</span> <span class="pre">logs</span> 
<span class="pre">ballista-scheduler-&lt;pod-id&gt;</span></code>:</p>
+<div class="highlight-default notranslate"><div 
class="highlight"><pre><span></span>$ kubectl logs 
ballista-scheduler-&lt;pod-id&gt;
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler 
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor 
request for ExecutorMetadata { id: 
&quot;b5e81711-1c5c-46ec-8522-d8b359793188&quot;, host: 
&quot;10.1.23.149&quot;, port: 50051 }
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor 
request for ExecutorMetadata { id: 
&quot;816e4502-a876-4ed8-b33f-86d243dcf63f&quot;, host: 
&quot;10.1.23.150&quot;, port: 50051 }
 </pre></div>
 </div>
 </section>
diff --git a/user-guide/python.html b/user-guide/python.html
index 7a179e9a..476975e7 100644
--- a/user-guide/python.html
+++ b/user-guide/python.html
@@ -437,12 +437,8 @@ COUNT(UInt8(1)): int64]
 <div class="highlight-python notranslate"><div 
class="highlight"><pre><span></span><span class="kn">from</span><span 
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">BallistaBuilder</span>
 <span class="kn">import</span><span class="w"> </span><span 
class="nn">pyarrow</span>
 
-<span class="c1"># an alias</span>
-<span class="c1"># TODO implement Functions</span>
-<span class="n">f</span> <span class="o">=</span> <span 
class="n">ballista</span><span class="o">.</span><span 
class="n">functions</span>
-
 <span class="c1"># create a context</span>
-<span class="n">ctx</span> <span class="o">=</span> <span 
class="n">Ballista</span><span class="p">()</span><span class="o">.</span><span 
class="n">standalone</span><span class="p">()</span>
+<span class="n">ctx</span> <span class="o">=</span> <span 
class="n">BallistaBuilder</span><span class="p">()</span><span 
class="o">.</span><span class="n">standalone</span><span class="p">()</span>
 
 <span class="c1"># create a RecordBatch and a new DataFrame from it</span>
 <span class="n">batch</span> <span class="o">=</span> <span 
class="n">pyarrow</span><span class="o">.</span><span 
class="n">RecordBatch</span><span class="o">.</span><span 
class="n">from_arrays</span><span class="p">(</span>
@@ -452,9 +448,10 @@ COUNT(UInt8(1)): int64]
 <span class="n">df</span> <span class="o">=</span> <span 
class="n">ctx</span><span class="o">.</span><span 
class="n">create_dataframe</span><span class="p">([[</span><span 
class="n">batch</span><span class="p">]])</span>
 
 <span class="c1"># create a new statement</span>
+<span class="kn">from</span><span class="w"> </span><span 
class="nn">datafusion</span><span class="w"> </span><span 
class="kn">import</span> <span class="n">col</span>
 <span class="n">df</span> <span class="o">=</span> <span 
class="n">df</span><span class="o">.</span><span class="n">select</span><span 
class="p">(</span>
-    <span class="n">f</span><span class="o">.</span><span 
class="n">col</span><span class="p">(</span><span 
class="s2">&quot;a&quot;</span><span class="p">)</span> <span 
class="o">+</span> <span class="n">f</span><span class="o">.</span><span 
class="n">col</span><span class="p">(</span><span 
class="s2">&quot;b&quot;</span><span class="p">),</span>
-    <span class="n">f</span><span class="o">.</span><span 
class="n">col</span><span class="p">(</span><span 
class="s2">&quot;a&quot;</span><span class="p">)</span> <span 
class="o">-</span> <span class="n">f</span><span class="o">.</span><span 
class="n">col</span><span class="p">(</span><span 
class="s2">&quot;b&quot;</span><span class="p">),</span>
+    <span class="n">col</span><span class="p">(</span><span 
class="s2">&quot;a&quot;</span><span class="p">)</span> <span 
class="o">+</span> <span class="n">col</span><span class="p">(</span><span 
class="s2">&quot;b&quot;</span><span class="p">),</span>
+    <span class="n">col</span><span class="p">(</span><span 
class="s2">&quot;a&quot;</span><span class="p">)</span> <span 
class="o">-</span> <span class="n">col</span><span class="p">(</span><span 
class="s2">&quot;b&quot;</span><span class="p">),</span>
 <span class="p">)</span>
 
 <span class="c1"># execute and collect the first (and only) batch</span>
diff --git a/user-guide/tuning-guide.html b/user-guide/tuning-guide.html
index e6e94613..b2485c51 100644
--- a/user-guide/tuning-guide.html
+++ b/user-guide/tuning-guide.html
@@ -368,8 +368,8 @@ memory, as well as supporting spill-to-disk to reduce 
memory pressure.</p>
 <p>Ballista supports both push-based and pull-based task scheduling. It is 
recommended that you try both to determine
 which is the best for your use case.</p>
 <p>Pull-based scheduling works in a similar way to Apache Spark and push-based 
scheduling can result in lower latency.</p>
-<p>The scheduling policy can be specified in the <code class="docutils literal 
notranslate"><span class="pre">--scheduler_policy</span></code> parameter when 
starting the scheduler and executor
-processes. The default is <code class="docutils literal notranslate"><span 
class="pre">pull-based</span></code>.</p>
+<p>The scheduling policy can be specified in the <code class="docutils literal 
notranslate"><span class="pre">--scheduler-policy</span></code> parameter when 
starting the scheduler and executor
+processes. The default is <code class="docutils literal notranslate"><span 
class="pre">pull-staged</span></code>.</p>
 </section>
 <section id="viewing-query-plans-and-metrics">
 <h2>Viewing Query Plans and Metrics<a class="headerlink" 
href="#viewing-query-plans-and-metrics" title="Link to this heading">¶</a></h2>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to