martin-g commented on code in PR #1881:
URL: 
https://github.com/apache/datafusion-ballista/pull/1881#discussion_r3440916745


##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,167 @@
 
 # Ballista Quickstart
 
-A simple way to start a local cluster for testing purposes is to use cargo to 
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on 
your goal:
 
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker-2-min) | [Build from 
source](#path-b-build-from-source-20-min) |
+|---|---|---|
+| Goal | Try Ballista against the last stable release | Develop or test 
against local code changes |
+| Prerequisites | Docker | Rust, protoc |
+| Cold start time | ~2 min (image pull) | ~20 min (full compile) |

Review Comment:
   Again, the times here depend on your network speed and your hardware ...



##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
 
 # Starting a Ballista Cluster using Docker Compose
 
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last 
stable release
+or want to run against local source changes.
 
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
 
-To create the required Docker images please refer to the [docker deployment 
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain 
needed.
+Images are published on each stable release; `latest` tracks the most recent 
release,
+not the `main` branch.
 
-## Start a Cluster
+```bash
+docker compose -f docker-compose.quick.yml up
+```
+
+See the [quickstart guide](quick-start.md) for connection instructions and 
data volume setup.
 
-Using the 
[docker-compose.yml](https://github.com/apache/datafusion-ballista/blob/main/docker-compose.yml)
 from the
-source repository, run the following command to start a cluster:
+## Option 2: Build from source
+
+`docker-compose.yml` builds executor and scheduler images from the local 
Dockerfiles.
+The Dockerfiles copy pre-compiled binaries — they do **not** run `cargo build` 
themselves.
+You must compile first:
 
 ```bash
-docker-compose up --build
+# Step 1 — compile (requires Rust + protoc, takes ~20 min cold)

Review Comment:
   `~20 min` depends on your hardware.
   For me it takes 3 mins to `cargo clean && cargo build --release` the whole 
project.



##########
docs/source/user-guide/deployment/docker-compose.md:
##########
@@ -19,37 +19,51 @@
 
 # Starting a Ballista Cluster using Docker Compose
 
-Docker Compose is a convenient way to launch a cluster when testing locally.
+Two Compose files are provided. Choose based on whether you need the last 
stable release
+or want to run against local source changes.
 
-## Build Docker Images
+## Option 1: Pre-built images (no local build required)
 
-To create the required Docker images please refer to the [docker deployment 
page](docker.md).
+`docker-compose.quick.yml` pulls images directly from GHCR — no Rust toolchain 
needed.
+Images are published on each stable release; `latest` tracks the most recent 
release,
+not the `main` branch.
 
-## Start a Cluster
+```bash
+docker compose -f docker-compose.quick.yml up
+```
+
+See the [quickstart guide](quick-start.md) for connection instructions and 
data volume setup.
 
-Using the 
[docker-compose.yml](https://github.com/apache/datafusion-ballista/blob/main/docker-compose.yml)
 from the
-source repository, run the following command to start a cluster:
+## Option 2: Build from source
+
+`docker-compose.yml` builds executor and scheduler images from the local 
Dockerfiles.
+The Dockerfiles copy pre-compiled binaries — they do **not** run `cargo build` 
themselves.
+You must compile first:
 
 ```bash
-docker-compose up --build
+# Step 1 — compile (requires Rust + protoc, takes ~20 min cold)
+cargo build --release
+
+# Step 2 — build Docker images and start the cluster
+docker compose up --build
 ```
 
-This should show output similar to the following:
+Skipping Step 1 will cause the build to fail because the `COPY 
target/release/ballista-*`
+instruction in the Dockerfiles will find no binaries to copy.
+
+Expected output after a successful start:
 
-```bash
-$ docker-compose up
-Creating network "ballista-benchmarks_default" with the default driver
-Creating ballista-benchmarks_ballista-scheduler_1 ... done
-Creating ballista-benchmarks_ballista-executor_1  ... done
-Attaching to ballista-benchmarks_ballista-scheduler_1, 
ballista-benchmarks_ballista-executor_1
-ballista-scheduler_1  | INFO ballista_scheduler: Ballista v52.0.0 Scheduler 
listening on 0.0.0.0:50050
-ballista-executor_1   | INFO ballista_executor: Ballista v52.0.0 Rust Executor 
listening on 0.0.0.0:50051
+```
+ballista-scheduler_1  | Ballista Scheduler listening on 0.0.0.0:50050

Review Comment:
   Which version of Docker Compose do you use ?
   v2 prints the names with `...-N` suffix, not `..._N`.



##########
docker-compose.quick.yml:
##########
@@ -0,0 +1,82 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# Quick-start cluster using pre-built images from GHCR.
+# No local build required — Docker is the only prerequisite.
+#
+# Runs the last stable release. To test against unreleased changes,
+# use docker-compose.yml (requires `cargo build --release` first).
+#
+# Usage:
+#   docker compose -f docker-compose.quick.yml up
+#
+# Connect from Rust:
+#   SessionContext::remote("df://localhost:50050").await?
+#
+# Connect from the CLI:
+#   cargo run -p ballista-cli -- --host localhost --port 50050
+#
+# To make local data available inside executors, uncomment and
+# set the volume path under ballista-executor:
+#   volumes:
+#     - /absolute/path/to/your/data:/data:ro
+
+services:
+  ballista-scheduler:
+    image: ghcr.io/apache/datafusion-ballista-scheduler:latest
+    # --advertise-flight-sql-endpoint enables the scheduler to proxy all
+    # result fetching so clients only ever connect to port 50050.
+    # Without this flag, clients would need direct access to each
+    # executor's Arrow Flight port, which breaks in Docker networking.
+    command: >
+      --bind-host 0.0.0.0
+      --external-host ballista-scheduler
+      --advertise-flight-sql-endpoint
+    ports:
+      - "50050:50050"
+    environment:
+      - RUST_LOG=ballista=info,ballista_scheduler=info
+    healthcheck:
+      test: ["CMD", "bash", "-c", "</dev/tcp/127.0.0.1/50050"]
+      interval: 5s
+      timeout: 5s
+      retries: 10
+    restart: "no"
+
+  ballista-executor:
+    image: ghcr.io/apache/datafusion-ballista-executor:latest
+    command: >
+      --bind-host 0.0.0.0
+      --scheduler-host ballista-scheduler
+      --concurrent-tasks 4
+      --work-dir /work
+    environment:
+      - RUST_LOG=ballista=info,ballista_executor=info
+    # Uncomment to mount local data for queries:
+    # volumes:
+    #   - /absolute/path/to/your/data:/data:ro
+    depends_on:
+      ballista-scheduler:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "bash", "-c", "</dev/tcp/127.0.0.1/50051"]
+      interval: 5s
+      timeout: 5s
+      retries: 10
+    restart: "no"
+    deploy:
+      replicas: 2

Review Comment:
   This is taken into account only when Docker Swarm is used, no ?



##########
docs/source/user-guide/deployment/quick-start.md:
##########
@@ -19,128 +19,167 @@
 
 # Ballista Quickstart
 
-A simple way to start a local cluster for testing purposes is to use cargo to 
build the project and then run the scheduler and executor binaries directly.
+There are two ways to get a local Ballista cluster running. Choose based on 
your goal:
 
-Project Requirements:
+| | [Evaluate Ballista](#path-a-evaluate-with-docker-2-min) | [Build from 
source](#path-b-build-from-source-20-min) |
+|---|---|---|
+| Goal | Try Ballista against the last stable release | Develop or test 
against local code changes |
+| Prerequisites | Docker | Rust, protoc |
+| Cold start time | ~2 min (image pull) | ~20 min (full compile) |
+| Terminals needed | 1 | 3 |
+
+> [!IMPORTANT]
+> Ballista and DataFusion are developed independently. A given Ballista 
release may not be compatible
+> with the latest DataFusion version. Check the [compatibility 
matrix](../configs.md) before integrating.
+
+---
+
+## Path A: Evaluate with Docker (~2 min)
+
+The only prerequisite is [Docker](https://docs.docker.com/get-docker/) with 
Compose v2.
+
+This uses pre-built images from GHCR that are published on each stable 
release. The `latest` tag
+tracks the most recent release, not the `main` branch.
+
+```shell
+docker compose -f docker-compose.quick.yml up
+```
+
+You should see output similar to:
+
+```
+ballista-scheduler-1  | Ballista Scheduler v53.0.0 listening on 0.0.0.0:50050
+ballista-executor-1   | Executor registration succeed
+ballista-executor-2   | Executor registration succeed
+```
+
+Two executors start by default. The scheduler listens on `localhost:50050`.
+
+**Connect from Rust:**
+
+```rust
+let ctx = SessionContext::remote("df://localhost:50050").await?;
+```
+
+**Connect from the CLI** (requires a local build — no pre-built CLI image is 
published):
+
+```shell
+cargo run -p ballista-cli -- --host localhost --port 50050
+```
+
+**To make local data available inside the executors**, uncomment and set the 
`volumes` block
+in `docker-compose.quick.yml`:
+
+```yaml
+ballista-executor:
+  volumes:
+    - /absolute/path/to/your/data:/data:ro
+```
+
+Then reference `/data/yourfile.parquet` in your queries. The path must be the 
same inside
+every executor container.
+
+**Tear down:**
+
+```shell
+docker compose -f docker-compose.quick.yml down
+```
+
+---
+
+## Path B: Build from source (~20 min)
+
+Use this path if you need to test local code changes or run against the `main` 
branch.
+
+**Prerequisites:**
 
 - [Rust](https://www.rust-lang.org/tools/install)
 - [Protobuf Compiler](https://protobuf.dev/downloads/)
 
-## Build the project
-
-From the root of the project, build release binaries.
+**Step 1:** Build release binaries from the repository root:
 
 ```shell
 cargo build --release
 ```
 
-Start a Ballista scheduler process in a new terminal session.
+**Step 2:** Start the scheduler in a new terminal:
 
 ```shell
 RUST_LOG=info ./target/release/ballista-scheduler
 ```
 
-Start one or more Ballista executor processes in new terminal sessions. When 
starting more than one
-executor, a unique port number must be specified for each executor.
+**Step 3:** Start one or more executors, each in a new terminal. When running 
multiple
+executors, each needs a unique pair of ports:
 
 ```shell
 RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50051 
--bind-grpc-port 50052
+```
 
+```shell
 RUST_LOG=info ./target/release/ballista-executor -c 2 -p 50053 
--bind-grpc-port 50054
 ```
 
+> **Two-port model:** each executor exposes an Arrow Flight port (data, `-p`) 
and a gRPC
+> control port (`--bind-grpc-port`). Both must be reachable by the scheduler.
+
+---
+
 ## Running the examples
 
-The examples can be run using the `cargo run --bin` syntax. Open a new 
terminal session and run the following commands.
+Examples live in the `examples/` directory and connect to `localhost:50050` by 
default.
 
-### Distributed SQL Example
+### Distributed SQL example
 
 ```bash
 cd examples
 cargo run --release --example remote-sql
 ```
 
-#### Source code for distributed SQL example
+### Distributed DataFrame example
 
-```rust
-use ballista::prelude::*;
-use ballista_examples::test_util;
-use datafusion::{
-    execution::SessionStateBuilder,
-    prelude::{CsvReadOptions, SessionConfig, SessionContext},
-};
-
-/// This example demonstrates executing a simple query against an Arrow data 
source (CSV) and
-/// fetching results, using SQL
-#[tokio::main]
-async fn main() -> Result<()> {
-    let config = SessionConfig::new_with_ballista()
-        .with_target_partitions(4)
-        .with_ballista_job_name("Remote SQL Example");
-
-    let state = SessionStateBuilder::new()
-        .with_config(config)
-        .with_default_features()
-        .build();
-
-    let ctx = SessionContext::remote_with_state("df://localhost:50050", 
state).await?;
-
-    let test_data = test_util::examples_test_data();
-
-    ctx.register_csv(
-        "test",
-        &format!("{test_data}/aggregate_test_100.csv"),
-        CsvReadOptions::new(),
-    )
-    .await?;
-
-    let df = ctx
-        .sql(
-            "SELECT c1, MIN(c12), MAX(c12) \
-        FROM test \
-        WHERE c11 > 0.1 AND c11 < 0.9 \
-        GROUP BY c1",
-        )
-        .await?;
-
-    df.show().await?;
-
-    Ok(())
-}
+```bash
+cd examples
+cargo run --release --example remote-dataframe
 ```
 
-### Distributed DataFrame Example
+### Standalone (single-process) example
+
+No cluster needed — scheduler and executor run in the same process:
 
 ```bash
 cd examples
-cargo run --release --example remote-dataframe
+cargo run --release --example standalone-sql
 ```
 
-#### Source code for distributed DataFrame example

Review Comment:
   I found this source code useful. Maybe replace it with a link to the 
standalone-sql example ?!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to