This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-ballista.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 74ddb6f2 Publish built docs triggered by
5ec9ae6b0b5855e7744aa93e6a9acfee25f5ceab
74ddb6f2 is described below
commit 74ddb6f296df6ddc78138bd588ab8260baa3c675
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Sat Jan 17 19:19:17 2026 +0000
Publish built docs triggered by 5ec9ae6b0b5855e7744aa93e6a9acfee25f5ceab
---
_sources/contributors-guide/architecture.md.txt | 3 ---
_sources/contributors-guide/development.md.txt | 11 +++++++++++
_sources/user-guide/cli.md.txt | 2 +-
_sources/user-guide/configs.md.txt | 17 ++++++++---------
.../user-guide/deployment/docker-compose.md.txt | 10 +++-------
_sources/user-guide/deployment/docker.md.txt | 21 +++++++++------------
_sources/user-guide/deployment/kubernetes.md.txt | 22 +++++++++++-----------
_sources/user-guide/python.md.txt | 11 ++++-------
_sources/user-guide/tuning-guide.md.txt | 4 ++--
contributors-guide/architecture.html | 2 --
contributors-guide/development.html | 16 ++++++++++++++++
searchindex.js | 2 +-
user-guide/cli.html | 2 +-
user-guide/configs.html | 7 +++----
user-guide/deployment/docker-compose.html | 10 +++-------
user-guide/deployment/docker.html | 21 +++++++++------------
user-guide/deployment/kubernetes.html | 22 +++++++++++-----------
user-guide/python.html | 11 ++++-------
user-guide/tuning-guide.html | 4 ++--
19 files changed, 99 insertions(+), 99 deletions(-)
diff --git a/_sources/contributors-guide/architecture.md.txt
b/_sources/contributors-guide/architecture.md.txt
index b0e00730..b25321d0 100644
--- a/_sources/contributors-guide/architecture.md.txt
+++ b/_sources/contributors-guide/architecture.md.txt
@@ -80,9 +80,6 @@ plan or a SQL string. The scheduler then creates an execution
graph, which conta
stages (pipelines) that can be scheduled independently. This process is
explained in detail in the Distributed
Query Scheduling section of this guide.
-It is possible to have multiple schedulers running with shared state in etcd,
so that jobs can continue to run
-even if a scheduler process fails.
-
### Executor
The executor processes connect to a scheduler and poll for tasks to perform.
These tasks are physical plans in
diff --git a/_sources/contributors-guide/development.md.txt
b/_sources/contributors-guide/development.md.txt
index a21595b1..feefc8a6 100644
--- a/_sources/contributors-guide/development.md.txt
+++ b/_sources/contributors-guide/development.md.txt
@@ -65,3 +65,14 @@ cargo test
cd examples
cargo run --example standalone_sql --features=ballista/standalone
```
+
+## Benchmarking
+
+For performance testing and benchmarking with TPC-H and other datasets, see
the [benchmarks README](../../../benchmarks/README.md).
+
+This includes instructions for:
+
+- Generating TPC-H test data
+- Running benchmarks against DataFusion and Ballista
+- Comparing performance with Apache Spark
+- Running load tests
diff --git a/_sources/user-guide/cli.md.txt b/_sources/user-guide/cli.md.txt
index 213f6034..597bc195 100644
--- a/_sources/user-guide/cli.md.txt
+++ b/_sources/user-guide/cli.md.txt
@@ -71,7 +71,7 @@ It is also possible to run the CLI in standalone mode, where
it will create a sc
```bash
$ ballista-cli
-Ballista CLI v8.0.0
+Ballista CLI v51.0.0
> CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.001 seconds.
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index 5e909e15..56b847e1 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -96,14 +96,13 @@ manage the whole cluster are also needed to be taken care
of.
_Example: Specifying configuration options when starting the scheduler_
```shell
-./ballista-scheduler --scheduler-policy push-staged --event-loop-buffer-size
1000000 --executor-slots-policy
-round-robin-local
+./ballista-scheduler --scheduler-policy push-staged --event-loop-buffer-size
1000000 --task-distribution round-robin
```
-| key | type | default |
description
|
-| -------------------------------------------- | ------ | ----------- |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-| scheduler-policy | Utf8 | pull-staged | Sets
the task scheduling policy for the scheduler, possible values: pull-staged,
push-staged.
|
-| event-loop-buffer-size | UInt32 | 10000 | Sets
the event loop buffer size. for a system of high throughput, a larger value
like 1000000 is recommended.
|
-| executor-slots-policy | Utf8 | bias | Sets
the executor slots policy for the scheduler, possible values: bias,
round-robin, round-robin-local. For a cluster with single scheduler,
round-robin-local is recommended. |
-| finished-job-data-clean-up-interval-seconds | UInt64 | 300 | Sets
the delayed interval for cleaning up finished job data, mainly the shuffle
data, 0 means the cleaning up is disabled.
|
-| finished-job-state-clean-up-interval-seconds | UInt64 | 3600 | Sets
the delayed interval for cleaning up finished job state stored in the backend,
0 means the cleaning up is disabled.
|
+| key | type | default |
description
|
+| -------------------------------------------- | ------ | ----------- |
--------------------------------------------------------------------------------------------------------------------------
|
+| scheduler-policy | Utf8 | pull-staged | Sets
the task scheduling policy for the scheduler, possible values: pull-staged,
push-staged. |
+| event-loop-buffer-size | UInt32 | 10000 | Sets
the event loop buffer size. for a system of high throughput, a larger value
like 1000000 is recommended. |
+| task-distribution | Utf8 | bias | Sets
the task distribution policy for the scheduler, possible values: bias,
round-robin, consistent-hash. |
+| finished-job-data-clean-up-interval-seconds | UInt64 | 300 | Sets
the delayed interval for cleaning up finished job data, mainly the shuffle
data, 0 means the cleaning up is disabled. |
+| finished-job-state-clean-up-interval-seconds | UInt64 | 3600 | Sets
the delayed interval for cleaning up finished job state stored in the backend,
0 means the cleaning up is disabled. |
diff --git a/_sources/user-guide/deployment/docker-compose.md.txt
b/_sources/user-guide/deployment/docker-compose.md.txt
index 67f40b7a..96f69d69 100644
--- a/_sources/user-guide/deployment/docker-compose.md.txt
+++ b/_sources/user-guide/deployment/docker-compose.md.txt
@@ -39,15 +39,11 @@ This should show output similar to the following:
```bash
$ docker-compose up
Creating network "ballista-benchmarks_default" with the default driver
-Creating ballista-benchmarks_etcd_1 ... done
Creating ballista-benchmarks_ballista-scheduler_1 ... done
Creating ballista-benchmarks_ballista-executor_1 ... done
-Attaching to ballista-benchmarks_etcd_1,
ballista-benchmarks_ballista-scheduler_1,
ballista-benchmarks_ballista-executor_1
-ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor] Running
with config:
-ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor]
work_dir: /tmp/.tmpLVx39c
-ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor]
concurrent_tasks: 4
-ballista-scheduler_1 | [2021-08-28T15:55:22Z INFO ballista_scheduler]
Ballista v0.12.0 Scheduler listening on 0.0.0.0:50050
-ballista-executor_1 | [2021-08-28T15:55:22Z INFO ballista_executor]
Ballista v0.12.0 Rust Executor listening on 0.0.0.0:50051
+Attaching to ballista-benchmarks_ballista-scheduler_1,
ballista-benchmarks_ballista-executor_1
+ballista-scheduler_1 | INFO ballista_scheduler: Ballista v51.0.0 Scheduler
listening on 0.0.0.0:50050
+ballista-executor_1 | INFO ballista_executor: Ballista v51.0.0 Rust Executor
listening on 0.0.0.0:50051
```
The scheduler listens on port 50050 and this is the port that clients will
need to connect to.
diff --git a/_sources/user-guide/deployment/docker.md.txt
b/_sources/user-guide/deployment/docker.md.txt
index a0542377..cf5488a6 100644
--- a/_sources/user-guide/deployment/docker.md.txt
+++ b/_sources/user-guide/deployment/docker.md.txt
@@ -67,13 +67,10 @@ Run `docker logs CONTAINER_ID` to check the output from the
process:
```
$ docker logs a756055576f3
-2024-02-03T14:49:47.904571Z INFO main ThreadId(01)
ballista_scheduler::cluster: Initializing Sled database in temp directory
-
-2024-02-03T14:49:47.924679Z INFO main ThreadId(01)
ballista_scheduler::scheduler_process: Ballista v0.12.0 Scheduler listening on
0.0.0.0:50050
-2024-02-03T14:49:47.924709Z INFO main ThreadId(01)
ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task
scheduling policy of PullStaged
-2024-02-03T14:49:47.925261Z INFO main ThreadId(01)
ballista_scheduler::cluster::kv: Initializing heartbeat listener
-2024-02-03T14:49:47.925476Z INFO main ThreadId(01)
ballista_scheduler::scheduler_server::query_stage_scheduler: Starting
QueryStageScheduler
-2024-02-03T14:49:47.925587Z INFO tokio-runtime-worker ThreadId(47)
ballista_core::event_loop: Starting the event loop query_stage
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_process: Starting Scheduler grpc server
with task scheduling policy of PullStaged
+INFO ballista_scheduler::scheduler_server::query_stage_scheduler: Starting
QueryStageScheduler
+INFO ballista_core::event_loop: Starting the event loop query_stage
```
### Start Executors
@@ -99,11 +96,11 @@ Use `docker logs CONTAINER_ID` to check the output from the
executor(s):
```
$ docker logs fb8b530cee6d
-2024-02-03T14:50:24.061607Z INFO main ThreadId(01)
ballista_executor::executor_process: Running with config:
-2024-02-03T14:50:24.061649Z INFO main ThreadId(01)
ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
-2024-02-03T14:50:24.061655Z INFO main ThreadId(01)
ballista_executor::executor_process: concurrent_tasks: 48
-2024-02-03T14:50:24.063256Z INFO tokio-runtime-worker ThreadId(44)
ballista_executor::executor_process: Ballista v0.12.0 Rust Executor Flight
Server listening on 0.0.0.0:50051
-2024-02-03T14:50:24.063281Z INFO tokio-runtime-worker ThreadId(47)
ballista_executor::execution_loop: Starting poll work loop with scheduler
+INFO ballista_executor::executor_process: Running with config:
+INFO ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
+INFO ballista_executor::executor_process: concurrent_tasks: 48
+INFO ballista_executor::executor_process: Ballista v51.0.0 Rust Executor
Flight Server listening on 0.0.0.0:50051
+INFO ballista_executor::execution_loop: Starting poll work loop with scheduler
```
## Connect from the CLI
diff --git a/_sources/user-guide/deployment/kubernetes.md.txt
b/_sources/user-guide/deployment/kubernetes.md.txt
index d3062ed8..3e25d1f9 100644
--- a/_sources/user-guide/deployment/kubernetes.md.txt
+++ b/_sources/user-guide/deployment/kubernetes.md.txt
@@ -48,10 +48,10 @@ To create the required Docker images please refer to the
[docker deployment page
Once the images have been built, you can retag them and can push them to your
favourite Docker registry.
```bash
-docker tag apache/datafusion-ballista-scheduler:0.12.0
<your-repo>/datafusion-ballista-scheduler:0.12.0
-docker tag apache/datafusion-ballista-executor:0.12.0
<your-repo>/datafusion-ballista-executor:0.12.0
-docker push <your-repo>/datafusion-ballista-scheduler:0.12.0
-docker push <your-repo>/datafusion-ballista-executor:0.12.0
+docker tag apache/datafusion-ballista-scheduler:latest
<your-repo>/datafusion-ballista-scheduler:latest
+docker tag apache/datafusion-ballista-executor:latest
<your-repo>/datafusion-ballista-executor:latest
+docker push <your-repo>/datafusion-ballista-scheduler:latest
+docker push <your-repo>/datafusion-ballista-executor:latest
```
## Create Persistent Volume and Persistent Volume Claim
@@ -139,7 +139,7 @@ spec:
spec:
containers:
- name: ballista-scheduler
- image: <your-repo>/datafusion-ballista-scheduler:0.12.0
+ image: <your-repo>/datafusion-ballista-scheduler:latest
args: ["--bind-port=50050"]
ports:
- containerPort: 50050
@@ -169,7 +169,7 @@ spec:
spec:
containers:
- name: ballista-executor
- image: <your-repo>/datafusion-ballista-executor:0.12.0
+ image: <your-repo>/datafusion-ballista-executor:latest
args:
- "--bind-port=50051"
- "--scheduler-host=ballista-scheduler"
@@ -208,13 +208,13 @@ ballista-executor-78cc5b6486-7crdm 0/1 Pending 0
42s
ballista-scheduler-879f874c5-rnbd6 0/1 Pending 0 42s
```
-You can view the scheduler logs with `kubectl logs ballista-scheduler-0`:
+You can view the scheduler logs with `kubectl logs
ballista-scheduler-<pod-id>`:
```
-$ kubectl logs ballista-scheduler-0
-[2021-02-19T00:24:01Z INFO scheduler] Ballista v0.7.0 Scheduler listening on
0.0.0.0:50050
-[2021-02-19T00:24:16Z INFO ballista::scheduler] Received register_executor
request for ExecutorMetadata { id: "b5e81711-1c5c-46ec-8522-d8b359793188",
host: "10.1.23.149", port: 50051 }
-[2021-02-19T00:24:17Z INFO ballista::scheduler] Received register_executor
request for ExecutorMetadata { id: "816e4502-a876-4ed8-b33f-86d243dcf63f",
host: "10.1.23.150", port: 50051 }
+$ kubectl logs ballista-scheduler-<pod-id>
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor
request for ExecutorMetadata { id: "b5e81711-1c5c-46ec-8522-d8b359793188",
host: "10.1.23.149", port: 50051 }
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor
request for ExecutorMetadata { id: "816e4502-a876-4ed8-b33f-86d243dcf63f",
host: "10.1.23.150", port: 50051 }
```
## Port Forwarding
diff --git a/_sources/user-guide/python.md.txt
b/_sources/user-guide/python.md.txt
index f17ac68d..e7d13968 100644
--- a/_sources/user-guide/python.md.txt
+++ b/_sources/user-guide/python.md.txt
@@ -117,12 +117,8 @@ The following example demonstrates creating arrays with
PyArrow and then creatin
from ballista import BallistaBuilder
import pyarrow
-# an alias
-# TODO implement Functions
-f = ballista.functions
-
# create a context
-ctx = Ballista().standalone()
+ctx = BallistaBuilder().standalone()
# create a RecordBatch and a new DataFrame from it
batch = pyarrow.RecordBatch.from_arrays(
@@ -132,9 +128,10 @@ batch = pyarrow.RecordBatch.from_arrays(
df = ctx.create_dataframe([[batch]])
# create a new statement
+from datafusion import col
df = df.select(
- f.col("a") + f.col("b"),
- f.col("a") - f.col("b"),
+ col("a") + col("b"),
+ col("a") - col("b"),
)
# execute and collect the first (and only) batch
diff --git a/_sources/user-guide/tuning-guide.md.txt
b/_sources/user-guide/tuning-guide.md.txt
index 22955b44..fe4363df 100644
--- a/_sources/user-guide/tuning-guide.md.txt
+++ b/_sources/user-guide/tuning-guide.md.txt
@@ -73,8 +73,8 @@ which is the best for your use case.
Pull-based scheduling works in a similar way to Apache Spark and push-based
scheduling can result in lower latency.
-The scheduling policy can be specified in the `--scheduler_policy` parameter
when starting the scheduler and executor
-processes. The default is `pull-based`.
+The scheduling policy can be specified in the `--scheduler-policy` parameter
when starting the scheduler and executor
+processes. The default is `pull-staged`.
## Viewing Query Plans and Metrics
diff --git a/contributors-guide/architecture.html
b/contributors-guide/architecture.html
index 367a7788..d21c1d12 100644
--- a/contributors-guide/architecture.html
+++ b/contributors-guide/architecture.html
@@ -430,8 +430,6 @@ between the executor(s) and the scheduler for fetching
tasks and reporting task
plan or a SQL string. The scheduler then creates an execution graph, which
contains a physical plan broken down into
stages (pipelines) that can be scheduled independently. This process is
explained in detail in the Distributed
Query Scheduling section of this guide.</p>
-<p>It is possible to have multiple schedulers running with shared state in
etcd, so that jobs can continue to run
-even if a scheduler process fails.</p>
</section>
<section id="executor">
<h3>Executor<a class="headerlink" href="#executor" title="Link to this
heading">¶</a></h3>
diff --git a/contributors-guide/development.html
b/contributors-guide/development.html
index 476188da..364f1825 100644
--- a/contributors-guide/development.html
+++ b/contributors-guide/development.html
@@ -275,6 +275,11 @@
Running the examples
</a>
</li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#benchmarking">
+ Benchmarking
+ </a>
+ </li>
</ul>
</nav>
@@ -366,6 +371,17 @@ cargo<span class="w"> </span>run<span class="w">
</span>--example<span class="w"
</pre></div>
</div>
</section>
+<section id="benchmarking">
+<h2>Benchmarking<a class="headerlink" href="#benchmarking" title="Link to this
heading">¶</a></h2>
+<p>For performance testing and benchmarking with TPC-H and other datasets, see
the <span class="xref myst">benchmarks README</span>.</p>
+<p>This includes instructions for:</p>
+<ul class="simple">
+<li><p>Generating TPC-H test data</p></li>
+<li><p>Running benchmarks against DataFusion and Ballista</p></li>
+<li><p>Comparing performance with Apache Spark</p></li>
+<li><p>Running load tests</p></li>
+</ul>
+</section>
</section>
diff --git a/searchindex.js b/searchindex.js
index 04b10380..325691b0 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]],
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11,
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]],
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]],
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[17, null]],
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"Apache DataFusion Ballista": [[4, null]],
"Arrow-native": [[1, "arrow-native"]], "Autoscaling Executors": [[11,
"autoscaling-executors"]], "Ballista Architecture": [[1, null]], "Ballista Code
Organization": [[2, null]], "Ballista Command-line Interface": [[5, null]],
"Ballista Configuration Settings": [[6, "ballista-configuration-settings"]],
"Ballista Development": [[3, null]], "Ballista Python Bindings": [[17, null]],
"Ballista Quickstart": [[12, null]], [...]
\ No newline at end of file
diff --git a/user-guide/cli.html b/user-guide/cli.html
index e25381fd..5227a93f 100644
--- a/user-guide/cli.html
+++ b/user-guide/cli.html
@@ -373,7 +373,7 @@ process.</p>
<p>It is also possible to run the CLI in standalone mode, where it will create
a scheduler and executor in-process.</p>
<div class="highlight-bash notranslate"><div
class="highlight"><pre><span></span>$<span class="w"> </span>ballista-cli
-Ballista<span class="w"> </span>CLI<span class="w"> </span>v8.0.0
+Ballista<span class="w"> </span>CLI<span class="w"> </span>v51.0.0
><span class="w"> </span>CREATE<span class="w"> </span>EXTERNAL<span
class="w"> </span>TABLE<span class="w"> </span>foo<span class="w"> </span><span
class="o">(</span>a<span class="w"> </span>INT,<span class="w"> </span>b<span
class="w"> </span>INT<span class="o">)</span><span class="w">
</span>STORED<span class="w"> </span>AS<span class="w"> </span>CSV<span
class="w"> </span>LOCATION<span class="w"> </span><span
class="s1">'data.csv'</span><span class="p">;</span>
<span class="m">0</span><span class="w"> </span>rows<span class="w">
</span><span class="k">in</span><span class="w"> </span>set.<span class="w">
</span>Query<span class="w"> </span>took<span class="w"> </span><span
class="m">0</span>.001<span class="w"> </span>seconds.
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 4671f5fb..a888eefe 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -378,8 +378,7 @@
<p>Besides the BallistaContext configuration settings, a few configuration
settings for the Ballista scheduler to better
manage the whole cluster are also needed to be taken care of.</p>
<p><em>Example: Specifying configuration options when starting the
scheduler</em></p>
-<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>./ballista-scheduler<span class="w">
</span>--scheduler-policy<span class="w"> </span>push-staged<span class="w">
</span>--event-loop-buffer-size<span class="w"> </span><span
class="m">1000000</span><span class="w"> </span>--executor-slots-policy
-round-robin-local
+<div class="highlight-shell notranslate"><div
class="highlight"><pre><span></span>./ballista-scheduler<span class="w">
</span>--scheduler-policy<span class="w"> </span>push-staged<span class="w">
</span>--event-loop-buffer-size<span class="w"> </span><span
class="m">1000000</span><span class="w"> </span>--task-distribution<span
class="w"> </span>round-robin
</pre></div>
</div>
<table class="table">
@@ -401,10 +400,10 @@ round-robin-local
<td><p>10000</p></td>
<td><p>Sets the event loop buffer size. for a system of high throughput, a
larger value like 1000000 is recommended.</p></td>
</tr>
-<tr class="row-even"><td><p>executor-slots-policy</p></td>
+<tr class="row-even"><td><p>task-distribution</p></td>
<td><p>Utf8</p></td>
<td><p>bias</p></td>
-<td><p>Sets the executor slots policy for the scheduler, possible values:
bias, round-robin, round-robin-local. For a cluster with single scheduler,
round-robin-local is recommended.</p></td>
+<td><p>Sets the task distribution policy for the scheduler, possible values:
bias, round-robin, consistent-hash.</p></td>
</tr>
<tr class="row-odd"><td><p>finished-job-data-clean-up-interval-seconds</p></td>
<td><p>UInt64</p></td>
diff --git a/user-guide/deployment/docker-compose.html
b/user-guide/deployment/docker-compose.html
index c241d8fd..0afcc837 100644
--- a/user-guide/deployment/docker-compose.html
+++ b/user-guide/deployment/docker-compose.html
@@ -333,15 +333,11 @@ source repository, run the following command to start a
cluster:</p>
<p>This should show output similar to the following:</p>
<div class="highlight-bash notranslate"><div
class="highlight"><pre><span></span>$<span class="w">
</span>docker-compose<span class="w"> </span>up
Creating<span class="w"> </span>network<span class="w"> </span><span
class="s2">"ballista-benchmarks_default"</span><span class="w">
</span>with<span class="w"> </span>the<span class="w"> </span>default<span
class="w"> </span>driver
-Creating<span class="w"> </span>ballista-benchmarks_etcd_1<span class="w">
</span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>ballista-benchmarks_ballista-scheduler_1<span
class="w"> </span>...<span class="w"> </span><span class="k">done</span>
Creating<span class="w"> </span>ballista-benchmarks_ballista-executor_1<span
class="w"> </span>...<span class="w"> </span><span class="k">done</span>
-Attaching<span class="w"> </span>to<span class="w">
</span>ballista-benchmarks_etcd_1,<span class="w">
</span>ballista-benchmarks_ballista-scheduler_1,<span class="w">
</span>ballista-benchmarks_ballista-executor_1
-ballista-executor_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span><span class="o">[</span><span
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span
class="w"> </span>ballista_executor<span class="o">]</span><span class="w">
</span>Running<span class="w"> </span>with<span class="w"> </span>config:
-ballista-executor_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span><span class="o">[</span><span
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span
class="w"> </span>ballista_executor<span class="o">]</span><span class="w">
</span>work_dir:<span class="w"> </span>/tmp/.tmpLVx39c
-ballista-executor_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span><span class="o">[</span><span
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span
class="w"> </span>ballista_executor<span class="o">]</span><span class="w">
</span>concurrent_tasks:<span class="w"> </span><span class="m">4</span>
-ballista-scheduler_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span><span class="o">[</span><span
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span
class="w"> </span>ballista_scheduler<span class="o">]</span><span class="w">
</span>Ballista<span class="w"> </span>v0.12.0<span class="w">
</span>Scheduler<span class="w"> </span>listening<span class="w">
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50050
-ballista-executor_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span><span class="o">[</span><span
class="m">2021</span>-08-28T15:55:22Z<span class="w"> </span>INFO<span
class="w"> </span>ballista_executor<span class="o">]</span><span class="w">
</span>Ballista<span class="w"> </span>v0.12.0<span class="w"> </span>Rust<span
class="w"> </span>Executor<span class="w"> </span>listening<span class="w">
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50051
+Attaching<span class="w"> </span>to<span class="w">
</span>ballista-benchmarks_ballista-scheduler_1,<span class="w">
</span>ballista-benchmarks_ballista-executor_1
+ballista-scheduler_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span>INFO<span class="w"> </span>ballista_scheduler:<span
class="w"> </span>Ballista<span class="w"> </span>v51.0.0<span class="w">
</span>Scheduler<span class="w"> </span>listening<span class="w">
</span>on<span class="w"> </span><span class="m">0</span>.0.0.0:50050
+ballista-executor_1<span class="w"> </span><span class="p">|</span><span
class="w"> </span>INFO<span class="w"> </span>ballista_executor:<span
class="w"> </span>Ballista<span class="w"> </span>v51.0.0<span class="w">
</span>Rust<span class="w"> </span>Executor<span class="w">
</span>listening<span class="w"> </span>on<span class="w"> </span><span
class="m">0</span>.0.0.0:50051
</pre></div>
</div>
<p>The scheduler listens on port 50050 and this is the port that clients will
need to connect to.</p>
diff --git a/user-guide/deployment/docker.html
b/user-guide/deployment/docker.html
index 17192b13..c71f26af 100644
--- a/user-guide/deployment/docker.html
+++ b/user-guide/deployment/docker.html
@@ -369,13 +369,10 @@ a756055576f3
apache/datafusion-ballista-scheduler:latest "/root/schedul
</div>
<p>Run <code class="docutils literal notranslate"><span
class="pre">docker</span> <span class="pre">logs</span> <span
class="pre">CONTAINER_ID</span></code> to check the output from the process:</p>
<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>$ docker logs a756055576f3
-2024-02-03T14:49:47.904571Z INFO main ThreadId(01)
ballista_scheduler::cluster: Initializing Sled database in temp directory
-
-2024-02-03T14:49:47.924679Z INFO main ThreadId(01)
ballista_scheduler::scheduler_process: Ballista v0.12.0 Scheduler listening on
0.0.0.0:50050
-2024-02-03T14:49:47.924709Z INFO main ThreadId(01)
ballista_scheduler::scheduler_process: Starting Scheduler grpc server with task
scheduling policy of PullStaged
-2024-02-03T14:49:47.925261Z INFO main ThreadId(01)
ballista_scheduler::cluster::kv: Initializing heartbeat listener
-2024-02-03T14:49:47.925476Z INFO main ThreadId(01)
ballista_scheduler::scheduler_server::query_stage_scheduler: Starting
QueryStageScheduler
-2024-02-03T14:49:47.925587Z INFO tokio-runtime-worker ThreadId(47)
ballista_core::event_loop: Starting the event loop query_stage
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_process: Starting Scheduler grpc server
with task scheduling policy of PullStaged
+INFO ballista_scheduler::scheduler_server::query_stage_scheduler: Starting
QueryStageScheduler
+INFO ballista_core::event_loop: Starting the event loop query_stage
</pre></div>
</div>
</section>
@@ -396,11 +393,11 @@ a756055576f3
apache/datafusion-ballista-scheduler:latest "/root/schedul
</div>
<p>Use <code class="docutils literal notranslate"><span
class="pre">docker</span> <span class="pre">logs</span> <span
class="pre">CONTAINER_ID</span></code> to check the output from the
executor(s):</p>
<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>$ docker logs fb8b530cee6d
-2024-02-03T14:50:24.061607Z INFO main ThreadId(01)
ballista_executor::executor_process: Running with config:
-2024-02-03T14:50:24.061649Z INFO main ThreadId(01)
ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
-2024-02-03T14:50:24.061655Z INFO main ThreadId(01)
ballista_executor::executor_process: concurrent_tasks: 48
-2024-02-03T14:50:24.063256Z INFO tokio-runtime-worker ThreadId(44)
ballista_executor::executor_process: Ballista v0.12.0 Rust Executor Flight
Server listening on 0.0.0.0:50051
-2024-02-03T14:50:24.063281Z INFO tokio-runtime-worker ThreadId(47)
ballista_executor::execution_loop: Starting poll work loop with scheduler
+INFO ballista_executor::executor_process: Running with config:
+INFO ballista_executor::executor_process: work_dir: /tmp/.tmpAkP3pZ
+INFO ballista_executor::executor_process: concurrent_tasks: 48
+INFO ballista_executor::executor_process: Ballista v51.0.0 Rust Executor
Flight Server listening on 0.0.0.0:50051
+INFO ballista_executor::execution_loop: Starting poll work loop with scheduler
</pre></div>
</div>
</section>
diff --git a/user-guide/deployment/kubernetes.html
b/user-guide/deployment/kubernetes.html
index a52e78c8..11808791 100644
--- a/user-guide/deployment/kubernetes.html
+++ b/user-guide/deployment/kubernetes.html
@@ -368,10 +368,10 @@ must be enabled using the following command.</p>
<section id="publishing-docker-images">
<h2>Publishing Docker Images<a class="headerlink"
href="#publishing-docker-images" title="Link to this heading">¶</a></h2>
<p>Once the images have been built, you can retag them and can push them to
your favourite Docker registry.</p>
-<div class="highlight-bash notranslate"><div
class="highlight"><pre><span></span>docker<span class="w"> </span>tag<span
class="w"> </span>apache/datafusion-ballista-scheduler:0.12.0<span class="w">
</span><your-repo>/datafusion-ballista-scheduler:0.12.0
-docker<span class="w"> </span>tag<span class="w">
</span>apache/datafusion-ballista-executor:0.12.0<span class="w">
</span><your-repo>/datafusion-ballista-executor:0.12.0
-docker<span class="w"> </span>push<span class="w">
</span><your-repo>/datafusion-ballista-scheduler:0.12.0
-docker<span class="w"> </span>push<span class="w">
</span><your-repo>/datafusion-ballista-executor:0.12.0
+<div class="highlight-bash notranslate"><div
class="highlight"><pre><span></span>docker<span class="w"> </span>tag<span
class="w"> </span>apache/datafusion-ballista-scheduler:latest<span class="w">
</span><your-repo>/datafusion-ballista-scheduler:latest
+docker<span class="w"> </span>tag<span class="w">
</span>apache/datafusion-ballista-executor:latest<span class="w">
</span><your-repo>/datafusion-ballista-executor:latest
+docker<span class="w"> </span>push<span class="w">
</span><your-repo>/datafusion-ballista-scheduler:latest
+docker<span class="w"> </span>push<span class="w">
</span><your-repo>/datafusion-ballista-executor:latest
</pre></div>
</div>
</section>
@@ -453,7 +453,7 @@ persistentvolumeclaim/data-pv-claim<span class="w">
</span>created
<span class="w"> </span><span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">containers</span><span
class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span
class="w"> </span><span class="nt">name</span><span class="p">:</span><span
class="w"> </span><span class="l l-Scalar
l-Scalar-Plain">ballista-scheduler</span>
-<span class="w"> </span><span class="nt">image</span><span
class="p">:</span><span class="w"> </span><span class="l l-Scalar
l-Scalar-Plain"><your-repo>/datafusion-ballista-scheduler:0.12.0</span>
+<span class="w"> </span><span class="nt">image</span><span
class="p">:</span><span class="w"> </span><span class="l l-Scalar
l-Scalar-Plain"><your-repo>/datafusion-ballista-scheduler:latest</span>
<span class="w"> </span><span class="nt">args</span><span
class="p">:</span><span class="w"> </span><span class="p
p-Indicator">[</span><span class="s">"--bind-port=50050"</span><span
class="p p-Indicator">]</span>
<span class="w"> </span><span class="nt">ports</span><span
class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span
class="w"> </span><span class="nt">containerPort</span><span
class="p">:</span><span class="w"> </span><span class="l l-Scalar
l-Scalar-Plain">50050</span>
@@ -483,7 +483,7 @@ persistentvolumeclaim/data-pv-claim<span class="w">
</span>created
<span class="w"> </span><span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">containers</span><span
class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span
class="w"> </span><span class="nt">name</span><span class="p">:</span><span
class="w"> </span><span class="l l-Scalar
l-Scalar-Plain">ballista-executor</span>
-<span class="w"> </span><span class="nt">image</span><span
class="p">:</span><span class="w"> </span><span class="l l-Scalar
l-Scalar-Plain"><your-repo>/datafusion-ballista-executor:0.12.0</span>
+<span class="w"> </span><span class="nt">image</span><span
class="p">:</span><span class="w"> </span><span class="l l-Scalar
l-Scalar-Plain"><your-repo>/datafusion-ballista-executor:latest</span>
<span class="w"> </span><span class="nt">args</span><span
class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span
class="w"> </span><span class="s">"--bind-port=50051"</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span
class="w"> </span><span
class="s">"--scheduler-host=ballista-scheduler"</span>
@@ -517,11 +517,11 @@ ballista-executor-78cc5b6486-7crdm<span class="w">
</span><span class="m">0</s
ballista-scheduler-879f874c5-rnbd6<span class="w"> </span><span
class="m">0</span>/1<span class="w"> </span>Pending<span class="w">
</span><span class="m">0</span><span class="w"> </span>42s
</pre></div>
</div>
-<p>You can view the scheduler logs with <code class="docutils literal
notranslate"><span class="pre">kubectl</span> <span class="pre">logs</span>
<span class="pre">ballista-scheduler-0</span></code>:</p>
-<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>$ kubectl logs ballista-scheduler-0
-[2021-02-19T00:24:01Z INFO scheduler] Ballista v0.7.0 Scheduler listening on
0.0.0.0:50050
-[2021-02-19T00:24:16Z INFO ballista::scheduler] Received register_executor
request for ExecutorMetadata { id:
"b5e81711-1c5c-46ec-8522-d8b359793188", host:
"10.1.23.149", port: 50051 }
-[2021-02-19T00:24:17Z INFO ballista::scheduler] Received register_executor
request for ExecutorMetadata { id:
"816e4502-a876-4ed8-b33f-86d243dcf63f", host:
"10.1.23.150", port: 50051 }
+<p>You can view the scheduler logs with <code class="docutils literal
notranslate"><span class="pre">kubectl</span> <span class="pre">logs</span>
<span class="pre">ballista-scheduler-<pod-id></span></code>:</p>
+<div class="highlight-default notranslate"><div
class="highlight"><pre><span></span>$ kubectl logs
ballista-scheduler-<pod-id>
+INFO ballista_scheduler::scheduler_process: Ballista v51.0.0 Scheduler
listening on 0.0.0.0:50050
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor
request for ExecutorMetadata { id:
"b5e81711-1c5c-46ec-8522-d8b359793188", host:
"10.1.23.149", port: 50051 }
+INFO ballista_scheduler::scheduler_server::grpc: Received register_executor
request for ExecutorMetadata { id:
"816e4502-a876-4ed8-b33f-86d243dcf63f", host:
"10.1.23.150", port: 50051 }
</pre></div>
</div>
</section>
diff --git a/user-guide/python.html b/user-guide/python.html
index 7a179e9a..476975e7 100644
--- a/user-guide/python.html
+++ b/user-guide/python.html
@@ -437,12 +437,8 @@ COUNT(UInt8(1)): int64]
<div class="highlight-python notranslate"><div
class="highlight"><pre><span></span><span class="kn">from</span><span
class="w"> </span><span class="nn">ballista</span><span class="w"> </span><span
class="kn">import</span> <span class="n">BallistaBuilder</span>
<span class="kn">import</span><span class="w"> </span><span
class="nn">pyarrow</span>
-<span class="c1"># an alias</span>
-<span class="c1"># TODO implement Functions</span>
-<span class="n">f</span> <span class="o">=</span> <span
class="n">ballista</span><span class="o">.</span><span
class="n">functions</span>
-
<span class="c1"># create a context</span>
-<span class="n">ctx</span> <span class="o">=</span> <span
class="n">Ballista</span><span class="p">()</span><span class="o">.</span><span
class="n">standalone</span><span class="p">()</span>
+<span class="n">ctx</span> <span class="o">=</span> <span
class="n">BallistaBuilder</span><span class="p">()</span><span
class="o">.</span><span class="n">standalone</span><span class="p">()</span>
<span class="c1"># create a RecordBatch and a new DataFrame from it</span>
<span class="n">batch</span> <span class="o">=</span> <span
class="n">pyarrow</span><span class="o">.</span><span
class="n">RecordBatch</span><span class="o">.</span><span
class="n">from_arrays</span><span class="p">(</span>
@@ -452,9 +448,10 @@ COUNT(UInt8(1)): int64]
<span class="n">df</span> <span class="o">=</span> <span
class="n">ctx</span><span class="o">.</span><span
class="n">create_dataframe</span><span class="p">([[</span><span
class="n">batch</span><span class="p">]])</span>
<span class="c1"># create a new statement</span>
+<span class="kn">from</span><span class="w"> </span><span
class="nn">datafusion</span><span class="w"> </span><span
class="kn">import</span> <span class="n">col</span>
<span class="n">df</span> <span class="o">=</span> <span
class="n">df</span><span class="o">.</span><span class="n">select</span><span
class="p">(</span>
- <span class="n">f</span><span class="o">.</span><span
class="n">col</span><span class="p">(</span><span
class="s2">"a"</span><span class="p">)</span> <span
class="o">+</span> <span class="n">f</span><span class="o">.</span><span
class="n">col</span><span class="p">(</span><span
class="s2">"b"</span><span class="p">),</span>
- <span class="n">f</span><span class="o">.</span><span
class="n">col</span><span class="p">(</span><span
class="s2">"a"</span><span class="p">)</span> <span
class="o">-</span> <span class="n">f</span><span class="o">.</span><span
class="n">col</span><span class="p">(</span><span
class="s2">"b"</span><span class="p">),</span>
+ <span class="n">col</span><span class="p">(</span><span
class="s2">"a"</span><span class="p">)</span> <span
class="o">+</span> <span class="n">col</span><span class="p">(</span><span
class="s2">"b"</span><span class="p">),</span>
+ <span class="n">col</span><span class="p">(</span><span
class="s2">"a"</span><span class="p">)</span> <span
class="o">-</span> <span class="n">col</span><span class="p">(</span><span
class="s2">"b"</span><span class="p">),</span>
<span class="p">)</span>
<span class="c1"># execute and collect the first (and only) batch</span>
diff --git a/user-guide/tuning-guide.html b/user-guide/tuning-guide.html
index e6e94613..b2485c51 100644
--- a/user-guide/tuning-guide.html
+++ b/user-guide/tuning-guide.html
@@ -368,8 +368,8 @@ memory, as well as supporting spill-to-disk to reduce
memory pressure.</p>
<p>Ballista supports both push-based and pull-based task scheduling. It is
recommended that you try both to determine
which is the best for your use case.</p>
<p>Pull-based scheduling works in a similar way to Apache Spark and push-based
scheduling can result in lower latency.</p>
-<p>The scheduling policy can be specified in the <code class="docutils literal
notranslate"><span class="pre">--scheduler_policy</span></code> parameter when
starting the scheduler and executor
-processes. The default is <code class="docutils literal notranslate"><span
class="pre">pull-based</span></code>.</p>
+<p>The scheduling policy can be specified in the <code class="docutils literal
notranslate"><span class="pre">--scheduler-policy</span></code> parameter when
starting the scheduler and executor
+processes. The default is <code class="docutils literal notranslate"><span
class="pre">pull-staged</span></code>.</p>
</section>
<section id="viewing-query-plans-and-metrics">
<h2>Viewing Query Plans and Metrics<a class="headerlink"
href="#viewing-query-plans-and-metrics" title="Link to this heading">¶</a></h2>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]