martin-g commented on code in PR #1881:
URL:
https://github.com/apache/datafusion-ballista/pull/1881#discussion_r3473711818
##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
# Starting a Ballista Cluster using Docker Compose
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last
stable release
+or want to run against local source changes.
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
-To create the required Docker images please refer to the [docker deployment
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain
needed.
+Images are published on each stable release; `latest` tracks the most recent
release,
Review Comment:
```suggestion
Images are published for each release of Ballista; `latest` tracks the most
recent release,
```
##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,173 @@
# Ballista Quickstart
-A simple way to start a local cluster for testing purposes is to use cargo to
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on
your goal:
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker) |
[Build from source](#path-b-build-from-source) |
+| ---------------- | ------------------------------------------------- |
---------------------------------------------- |
+| Goal | Try Ballista against the last stable release |
Develop or test against local code changes |
+| Prerequisites | Docker | Rust,
protoc |
+| Cold start | Image pull | Full
compile |
+| Terminals needed | 1 | 3
|
+
+> [!IMPORTANT]
+> Ballista and DataFusion are developed independently. A given Ballista
release may not be compatible
+> with the latest DataFusion version. Check the [compatibility
matrix](../configs.md) before integrating.
+
+---
+
+## Path A: Evaluate with Docker
+
+The only prerequisite is [Docker](https://docs.docker.com/get-docker/) with
Compose v2.
+
+This uses pre-built images from GHCR that are published on each stable
release. The `latest` tag
+tracks the most recent release, not the `main` branch.
+
+```shell
+docker compose -f docker-compose.quick.yml up
+```
+
+You should see output similar to:
+
+```
+ballista-scheduler-1 | Ballista Scheduler v53.0.0 listening on 0.0.0.0:50050
+ballista-executor-1 | Executor registration succeed
+ballista-executor-2 | Executor registration succeed
+```
+
+Two executors start by default. The scheduler listens on `localhost:50050`.
+
+**Connect from Rust:**
+
+```rust
+let ctx = SessionContext::remote("df://localhost:50050").await?;
+```
+
+**Connect from the CLI** (requires a local build — no pre-built CLI image is
published):
+
+```shell
+cargo run -p ballista-cli -- --host localhost --port 50050
+```
+
+**To make local data available inside the executors**, uncomment and set the
`volumes` block
+in `docker-compose.quick.yml`:
+
+```yaml
+ballista-executor:
+ volumes:
+ - /absolute/path/to/your/data:/absolute/path/to/your/data:ro
Review Comment:
Using the same paths for the host folder and the container one is confusing,
IMO.
```suggestion
- /path/to/your/data/in/the/host:/path/to/your/data/in/the/container:ro
```
##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
# Starting a Ballista Cluster using Docker Compose
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last
stable release
Review Comment:
```suggestion
Two docker-compose.yml files are provided. Choose based on whether you need
the last stable release
```
##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
# Starting a Ballista Cluster using Docker Compose
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last
stable release
+or want to run against local source changes.
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
-To create the required Docker images please refer to the [docker deployment
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain
needed.
Review Comment:
```suggestion
`docker-compose.prebuilt.yml` pulls images directly from GitHub Container
Registry (GHCR) — no Rust toolchain needed.
```
1) Do not use abbreviations which are not mentioned earlier
2) rename `quick` to `prebuilt` ?!
##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,173 @@
# Ballista Quickstart
-A simple way to start a local cluster for testing purposes is to use cargo to
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on
your goal:
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker) |
[Build from source](#path-b-build-from-source) |
+| ---------------- | ------------------------------------------------- |
---------------------------------------------- |
+| Goal | Try Ballista against the last stable release |
Develop or test against local code changes |
+| Prerequisites | Docker | Rust,
protoc |
+| Cold start | Image pull | Full
compile |
+| Terminals needed | 1 | 3
|
+
+> [!IMPORTANT]
+> Ballista and DataFusion are developed independently. A given Ballista
release may not be compatible
+> with the latest DataFusion version. Check the [compatibility
matrix](../configs.md) before integrating.
+
+---
+
+## Path A: Evaluate with Docker
+
+The only prerequisite is [Docker](https://docs.docker.com/get-docker/) with
Compose v2.
+
+This uses pre-built images from GHCR that are published on each stable
release. The `latest` tag
+tracks the most recent release, not the `main` branch.
+
+```shell
+docker compose -f docker-compose.quick.yml up
+```
+
+You should see output similar to:
+
+```
+ballista-scheduler-1 | Ballista Scheduler v53.0.0 listening on 0.0.0.0:50050
+ballista-executor-1 | Executor registration succeed
+ballista-executor-2 | Executor registration succeed
+```
+
+Two executors start by default. The scheduler listens on `localhost:50050`.
+
+**Connect from Rust:**
+
+```rust
+let ctx = SessionContext::remote("df://localhost:50050").await?;
+```
+
+**Connect from the CLI** (requires a local build — no pre-built CLI image is
published):
+
+```shell
+cargo run -p ballista-cli -- --host localhost --port 50050
+```
+
+**To make local data available inside the executors**, uncomment and set the
`volumes` block
+in `docker-compose.quick.yml`:
+
+```yaml
+ballista-executor:
+ volumes:
+ - /absolute/path/to/your/data:/absolute/path/to/your/data:ro
+```
+
+The container-side path must match exactly what you pass to `register_parquet`
or
+`register_csv` — the scheduler stores that path and sends it to the executor
as-is.
+
+**Tear down:**
+
+```shell
+docker compose -f docker-compose.quick.yml down
+```
+
+---
+
+## Path B: Build from source
+
+Use this path if you need to test local code changes or run against the `main`
branch.
+
+**Prerequisites:**
- [Rust](https://www.rust-lang.org/tools/install)
- [Protobuf Compiler](https://protobuf.dev/downloads/)
-## Build the project
-
-From the root of the project, build release binaries.
+**Step 1:** Build release binaries from the repository root:
```shell
cargo build --release
```
-Start a Ballista scheduler process in a new terminal session.
+**Step 2:** Start the scheduler in a new terminal:
```shell
RUST_LOG=info ./target/release/ballista-scheduler
```
-Start one or more Ballista executor processes in new terminal sessions. When
starting more than one
-executor, a unique port number must be specified for each executor.
+**Step 3:** Start one or more executors, each in a new terminal. When running
multiple
+executors, each needs a unique pair of ports:
```shell
RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50051
--bind-grpc-port 50052
Review Comment:
```suggestion
RUST_LOG=info ./target/release/ballista-executor --concurrent-tasks 2 --port
50051 --bind-grpc-port 50052
```
use the long names for clarity
##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
# Starting a Ballista Cluster using Docker Compose
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last
stable release
+or want to run against local source changes.
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
-To create the required Docker images please refer to the [docker deployment
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain
needed.
+Images are published on each stable release; `latest` tracks the most recent
release,
+not the `main` branch.
-## Start a Cluster
+```bash
+docker compose -f docker-compose.quick.yml up
+```
+
+See the [quickstart guide](quick-start.md) for connection instructions and
data volume setup.
-Using the
[docker-compose.yml](https://github.com/apache/datafusion-ballista/blob/main/docker-compose.yml)
from the
-source repository, run the following command to start a cluster:
+## Option 2: Build from source
+
+`docker-compose.yml` builds executor and scheduler images from the local
Dockerfiles.
Review Comment:
I'd suggest to rename `docker-compose.yml` to something that tells the user
that (s)he needs to build Ballista locally, e.g. `docker-compose.local.yml`
##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
# Starting a Ballista Cluster using Docker Compose
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last
stable release
+or want to run against local source changes.
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
-To create the required Docker images please refer to the [docker deployment
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain
needed.
+Images are published on each stable release; `latest` tracks the most recent
release,
+not the `main` branch.
-## Start a Cluster
+```bash
+docker compose -f docker-compose.quick.yml up
+```
+
+See the [quickstart guide](quick-start.md) for connection instructions and
data volume setup.
-Using the
[docker-compose.yml](https://github.com/apache/datafusion-ballista/blob/main/docker-compose.yml)
from the
-source repository, run the following command to start a cluster:
+## Option 2: Build from source
+
+`docker-compose.yml` builds executor and scheduler images from the local
Dockerfiles.
+The Dockerfiles copy pre-compiled binaries — they do **not** run `cargo build`
themselves.
Review Comment:
I wonder whether the Dockerfile should start building instead of copying.
Assuming the local Git clone is mounted the build will reuse and increment
build whatever is already available in the user's target/ folder.
##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,173 @@
# Ballista Quickstart
-A simple way to start a local cluster for testing purposes is to use cargo to
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on
your goal:
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker) |
[Build from source](#path-b-build-from-source) |
+| ---------------- | ------------------------------------------------- |
---------------------------------------------- |
+| Goal | Try Ballista against the last stable release |
Develop or test against local code changes |
+| Prerequisites | Docker | Rust,
protoc |
+| Cold start | Image pull | Full
compile |
+| Terminals needed | 1 | 3
|
+
+> [!IMPORTANT]
+> Ballista and DataFusion are developed independently. A given Ballista
release may not be compatible
+> with the latest DataFusion version. Check the [compatibility
matrix](../configs.md) before integrating.
+
+---
+
+## Path A: Evaluate with Docker
+
+The only prerequisite is [Docker](https://docs.docker.com/get-docker/) with
Compose v2.
+
+This uses pre-built images from GHCR that are published on each stable
release. The `latest` tag
+tracks the most recent release, not the `main` branch.
+
+```shell
+docker compose -f docker-compose.quick.yml up
+```
+
+You should see output similar to:
+
+```
+ballista-scheduler-1 | Ballista Scheduler v53.0.0 listening on 0.0.0.0:50050
+ballista-executor-1 | Executor registration succeed
+ballista-executor-2 | Executor registration succeed
+```
+
+Two executors start by default. The scheduler listens on `localhost:50050`.
+
+**Connect from Rust:**
+
+```rust
+let ctx = SessionContext::remote("df://localhost:50050").await?;
+```
+
+**Connect from the CLI** (requires a local build — no pre-built CLI image is
published):
+
+```shell
+cargo run -p ballista-cli -- --host localhost --port 50050
+```
+
+**To make local data available inside the executors**, uncomment and set the
`volumes` block
+in `docker-compose.quick.yml`:
+
+```yaml
+ballista-executor:
+ volumes:
+ - /absolute/path/to/your/data:/absolute/path/to/your/data:ro
+```
+
+The container-side path must match exactly what you pass to `register_parquet`
or
+`register_csv` — the scheduler stores that path and sends it to the executor
as-is.
+
+**Tear down:**
+
+```shell
+docker compose -f docker-compose.quick.yml down
+```
+
+---
+
+## Path B: Build from source
+
+Use this path if you need to test local code changes or run against the `main`
branch.
+
+**Prerequisites:**
- [Rust](https://www.rust-lang.org/tools/install)
- [Protobuf Compiler](https://protobuf.dev/downloads/)
-## Build the project
-
-From the root of the project, build release binaries.
+**Step 1:** Build release binaries from the repository root:
```shell
cargo build --release
```
-Start a Ballista scheduler process in a new terminal session.
+**Step 2:** Start the scheduler in a new terminal:
```shell
RUST_LOG=info ./target/release/ballista-scheduler
```
-Start one or more Ballista executor processes in new terminal sessions. When
starting more than one
-executor, a unique port number must be specified for each executor.
+**Step 3:** Start one or more executors, each in a new terminal. When running
multiple
+executors, each needs a unique pair of ports:
```shell
RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50051
--bind-grpc-port 50052
+```
+```shell
RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50053
--bind-grpc-port 50054
```
+> **Two-port model:** each executor exposes an Arrow Flight port (data, `-p`)
and a gRPC
Review Comment:
```suggestion
> **Two-port model:** each executor exposes an Arrow Flight port (data,
`--port`) and a gRPC
```
##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,173 @@
# Ballista Quickstart
-A simple way to start a local cluster for testing purposes is to use cargo to
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on
your goal:
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker) |
[Build from source](#path-b-build-from-source) |
+| ---------------- | ------------------------------------------------- |
---------------------------------------------- |
+| Goal | Try Ballista against the last stable release |
Develop or test against local code changes |
+| Prerequisites | Docker | Rust,
protoc |
+| Cold start | Image pull | Full
compile |
+| Terminals needed | 1 | 3
|
+
+> [!IMPORTANT]
+> Ballista and DataFusion are developed independently. A given Ballista
release may not be compatible
+> with the latest DataFusion version. Check the [compatibility
matrix](../configs.md) before integrating.
+
+---
+
+## Path A: Evaluate with Docker
+
+The only prerequisite is [Docker](https://docs.docker.com/get-docker/) with
Compose v2.
+
+This uses pre-built images from GHCR that are published on each stable
release. The `latest` tag
+tracks the most recent release, not the `main` branch.
+
+```shell
+docker compose -f docker-compose.quick.yml up
+```
+
+You should see output similar to:
+
+```
+ballista-scheduler-1 | Ballista Scheduler v53.0.0 listening on 0.0.0.0:50050
+ballista-executor-1 | Executor registration succeed
+ballista-executor-2 | Executor registration succeed
+```
+
+Two executors start by default. The scheduler listens on `localhost:50050`.
+
+**Connect from Rust:**
+
+```rust
+let ctx = SessionContext::remote("df://localhost:50050").await?;
+```
+
+**Connect from the CLI** (requires a local build — no pre-built CLI image is
published):
+
+```shell
+cargo run -p ballista-cli -- --host localhost --port 50050
+```
+
+**To make local data available inside the executors**, uncomment and set the
`volumes` block
+in `docker-compose.quick.yml`:
+
+```yaml
+ballista-executor:
+ volumes:
+ - /absolute/path/to/your/data:/absolute/path/to/your/data:ro
+```
+
+The container-side path must match exactly what you pass to `register_parquet`
or
+`register_csv` — the scheduler stores that path and sends it to the executor
as-is.
+
+**Tear down:**
+
+```shell
+docker compose -f docker-compose.quick.yml down
+```
+
+---
+
+## Path B: Build from source
+
+Use this path if you need to test local code changes or run against the `main`
branch.
+
+**Prerequisites:**
- [Rust](https://www.rust-lang.org/tools/install)
- [Protobuf Compiler](https://protobuf.dev/downloads/)
-## Build the project
-
-From the root of the project, build release binaries.
+**Step 1:** Build release binaries from the repository root:
```shell
cargo build --release
```
-Start a Ballista scheduler process in a new terminal session.
+**Step 2:** Start the scheduler in a new terminal:
```shell
RUST_LOG=info ./target/release/ballista-scheduler
```
-Start one or more Ballista executor processes in new terminal sessions. When
starting more than one
-executor, a unique port number must be specified for each executor.
+**Step 3:** Start one or more executors, each in a new terminal. When running
multiple
+executors, each needs a unique pair of ports:
```shell
RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50051
--bind-grpc-port 50052
+```
+```shell
RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50053
--bind-grpc-port 50054
Review Comment:
-
```suggestion
RUST_LOG=info ./target/release/ballista-executor --concurrent-tasks 2 --port
50053 --bind-grpc-port 50054
```
##########
docker-compose.quick.yml:
##########
@@ -0,0 +1,86 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Quick-start cluster using pre-built images from GHCR.
+# No local build required — Docker is the only prerequisite.
+#
+# Runs the last stable release. To test against unreleased changes,
+# use docker-compose.yml (requires `cargo build --release` first).
+#
+# Usage:
+# docker compose -f docker-compose.quick.yml up
+#
+# Connect from Rust:
+# SessionContext::remote("df://localhost:50050").await?
+#
+# Connect from the CLI:
+# cargo run -p ballista-cli -- --host localhost --port 50050
+#
+# To make local data available inside executors, uncomment and
+# set the volume path under ballista-executor. The container-side
+# path must match exactly what you pass to register_parquet/register_csv
+# since the scheduler stores and forwards that path to executors as-is:
+# volumes:
+# - /absolute/path/to/your/data:/absolute/path/to/your/data:ro
+
+services:
+ ballista-scheduler:
+ image: ghcr.io/apache/datafusion-ballista-scheduler:latest
+ # --advertise-flight-sql-endpoint enables the scheduler to proxy all
+ # result fetching so clients only ever connect to port 50050.
+ # Without this flag, clients would need direct access to each
+ # executor's Arrow Flight port, which breaks in Docker networking.
+ command: >
+ --bind-host 0.0.0.0
+ --external-host ballista-scheduler
+ --advertise-flight-sql-endpoint
+ ports:
+ - "50050:50050"
+ environment:
+ - RUST_LOG=ballista=info,ballista_scheduler=info
+ healthcheck:
+ test: ["CMD", "bash", "-c", "</dev/tcp/127.0.0.1/50050"]
+ interval: 5s
+ timeout: 5s
+ retries: 10
+ restart: "no"
+
+ ballista-executor:
+ image: ghcr.io/apache/datafusion-ballista-executor:latest
+ command: >
+ --bind-host 0.0.0.0
+ --scheduler-host ballista-scheduler
+ --concurrent-tasks 4
+ --work-dir /work
+ environment:
+ - RUST_LOG=ballista=info,ballista_executor=info
+ # Uncomment to mount local data for queries:
+ # volumes:
+ # - /absolute/path/to/your/data:/data:ro
Review Comment:
```suggestion
# - /path/to/your/data:/data:ro
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]