This is an automated email from the ASF dual-hosted git repository.
houqp pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new 7f3fe46 Improve Ballista crate README content (#878)
7f3fe46 is described below
commit 7f3fe466b8c4585d17509fb84c78738ebe909ded
Author: Andy Grove <[email protected]>
AuthorDate: Sun Aug 15 17:14:24 2021 -0600
Improve Ballista crate README content (#878)
* Improve Ballista crate README content
* prettier
* use doc include_str
* format
---
ballista/README.md | 3 +-
ballista/rust/client/README.md | 100 ++++++++++++++++++++++++++++++++++++-
ballista/rust/client/src/lib.rs | 98 +-----------------------------------
ballista/rust/core/README.md | 6 ++-
ballista/rust/core/src/lib.rs | 7 +--
ballista/rust/executor/README.md | 15 ++----
ballista/rust/executor/src/lib.rs | 5 +-
ballista/rust/scheduler/README.md | 37 ++------------
ballista/rust/scheduler/src/lib.rs | 5 +-
9 files changed, 113 insertions(+), 163 deletions(-)
diff --git a/ballista/README.md b/ballista/README.md
index eeb4273..1fc3bdb 100644
--- a/ballista/README.md
+++ b/ballista/README.md
@@ -37,8 +37,7 @@ redundancy in the case of a scheduler failing.
# Getting Started
-Fully working examples are available. Refer to the [Ballista Examples
README](../ballista-examples/README.md) for
-more information.
+Refer to the core [Ballista crate README](rust/client/README.md) for the
Getting Started guide.
## Distributed Scheduler Overview
diff --git a/ballista/rust/client/README.md b/ballista/rust/client/README.md
index a9fbd8e..eb68e68 100644
--- a/ballista/rust/client/README.md
+++ b/ballista/rust/client/README.md
@@ -17,6 +17,102 @@
under the License.
-->
-# Ballista - Rust
+# Ballista: Distributed Scheduler for Apache Arrow DataFusion
-This crate contains the Ballista client library. For an example usage, please
refer [here](../benchmarks/tpch/README.md).
+Ballista is a distributed compute platform primarily implemented in Rust, and
powered by Apache Arrow and
+DataFusion. It is built on an architecture that allows other programming
languages (such as Python, C++, and
+Java) to be supported as first-class citizens without paying a penalty for
serialization costs.
+
+The foundational technologies in Ballista are:
+
+- [Apache Arrow](https://arrow.apache.org/) memory model and compute kernels
for efficient processing of data.
+- [Apache Arrow Flight
Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
for efficient
+ data transfer between processes.
+- [Google Protocol Buffers](https://developers.google.com/protocol-buffers)
for serializing query plans.
+- [Docker](https://www.docker.com/) for packaging up executors along with
user-defined code.
+
+Ballista can be deployed as a standalone cluster and also supports
[Kubernetes](https://kubernetes.io/). In either
+case, the scheduler can be configured to use [etcd](https://etcd.io/) as a
backing store to (eventually) provide
+redundancy in the case of a scheduler failing.
+
+## Starting a cluster
+
+There are numerous ways to start a Ballista cluster, including support for
Docker and
+Kubernetes. For full documentation, refer to the
+[DataFusion User
Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide)
+
+A simple way to start a local cluster for testing purposes is to use cargo to
install
+the scheduler and executor crates.
+
+```bash
+cargo install ballista-scheduler
+cargo install ballista-executor
+```
+
+With these crates installed, it is now possible to start a scheduler process.
+
+```bash
+RUST_LOG=info ballista-scheduler
+```
+
+The scheduler will bind to port 50050 by default.
+
+Next, start an executor processes in a new terminal session with the specified
concurrency
+level.
+
+```bash
+RUST_LOG=info ballista-executor -c 4
+```
+
+The executor will bind to port 50051 by default. Additional executors can be
started by
+manually specifying a bind port. For example:
+
+```bash
+RUST_LOG=info ballista-executor --bind-port 50052 -c 4
+```
+
+## Executing a query
+
+Ballista provides a `BallistaContext` as a starting point for creating
queries. DataFrames can be created
+by invoking the `read_csv`, `read_parquet`, and `sql` methods.
+
+The following example runs a simple aggregate SQL query against a CSV file
from the
+[New York Taxi and Limousine
Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
+data set.
+
+```rust,no_run
+use ballista::prelude::*;
+use datafusion::arrow::util::pretty;
+use datafusion::prelude::CsvReadOptions;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+ // create configuration
+ let config = BallistaConfig::builder()
+ .set("ballista.shuffle.partitions", "4")
+ .build()?;
+
+ // connect to Ballista scheduler
+ let ctx = BallistaContext::remote("localhost", 50050, &config);
+
+ // register csv file with the execution context
+ ctx.register_csv(
+ "tripdata",
+ "/path/to/yellow_tripdata_2020-01.csv",
+ CsvReadOptions::new(),
+ )?;
+
+ // execute the query
+ let df = ctx.sql(
+ "SELECT passenger_count, MIN(fare_amount), MAX(fare_amount),
AVG(fare_amount), SUM(fare_amount)
+ FROM tripdata
+ GROUP BY passenger_count
+ ORDER BY passenger_count",
+ )?;
+
+ // collect the results and print them to stdout
+ let results = df.collect().await?;
+ pretty::print_batches(&results)?;
+ Ok(())
+}
+```
diff --git a/ballista/rust/client/src/lib.rs b/ballista/rust/client/src/lib.rs
index 35bd12b..125278d 100644
--- a/ballista/rust/client/src/lib.rs
+++ b/ballista/rust/client/src/lib.rs
@@ -15,103 +15,7 @@
// specific language governing permissions and limitations
// under the License.
-//! Ballista is a distributed compute platform primarily implemented in Rust,
and powered by Apache Arrow and
-//! DataFusion. It is built on an architecture that allows other programming
languages (such as Python, C++, and
-//! Java) to be supported as first-class citizens without paying a penalty for
serialization costs.
-//!
-//! The foundational technologies in Ballista are:
-//!
-//! - [Apache Arrow](https://arrow.apache.org/) memory model and compute
kernels for efficient processing of data.
-//! - [Apache Arrow Flight
Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/)
for efficient
-//! data transfer between processes.
-//! - [Google Protocol
Buffers](https://developers.google.com/protocol-buffers) for serializing query
plans.
-//! - [Docker](https://www.docker.com/) for packaging up executors along with
user-defined code.
-//!
-//! Ballista can be deployed as a standalone cluster and also supports
[Kubernetes](https://kubernetes.io/). In either
-//! case, the scheduler can be configured to use [etcd](https://etcd.io/) as a
backing store to (eventually) provide
-//! redundancy in the case of a scheduler failing.
-//!
-//! ## Starting a cluster
-//!
-//! There are numerous ways to start a Ballista cluster, including support for
Docker and
-//! Kubernetes. For full documentation, refer to the
-//! [DataFusion User
Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide)
-//!
-//! A simple way to start a local cluster for testing purposes is to use cargo
to install
-//! the scheduler and executor crates.
-//!
-//! ```bash
-//! cargo install ballista-scheduler
-//! cargo install ballista-executor
-//! ```
-//!
-//! With these crates installed, it is now possible to start a scheduler
process.
-//!
-//! ```bash
-//! RUST_LOG=info ballista-scheduler
-//! ```
-//!
-//! The scheduler will bind to port 50050 by default.
-//!
-//! Next, start an executor processes in a new terminal session with the
specified concurrency
-//! level.
-//!
-//! ```bash
-//! RUST_LOG=info ballista-executor -c 4
-//! ```
-//!
-//! The executor will bind to port 50051 by default. Additional executors can
be started by
-//! manually specifying a bind port. For example:
-//!
-//! ```bash
-//! RUST_LOG=info ballista-executor --bind-port 50052 -c 4
-//! ```
-//!
-//! ## Executing a query
-//!
-//! Ballista provides a `BallistaContext` as a starting point for creating
queries. DataFrames can be created
-//! by invoking the `read_csv`, `read_parquet`, and `sql` methods.
-//!
-//! The following example runs a simple aggregate SQL query against a CSV file
from the
-//! [New York Taxi and Limousine
Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
-//! data set.
-//!
-//! ```no_run
-//! use ballista::prelude::*;
-//! use datafusion::arrow::util::pretty;
-//! use datafusion::prelude::CsvReadOptions;
-//!
-//! #[tokio::main]
-//! async fn main() -> Result<()> {
-//! // create configuration
-//! let config = BallistaConfig::builder()
-//! .set("ballista.shuffle.partitions", "4")
-//! .build()?;
-//!
-//! // connect to Ballista scheduler
-//! let ctx = BallistaContext::remote("localhost", 50050, &config);
-//!
-//! // register csv file with the execution context
-//! ctx.register_csv(
-//! "tripdata",
-//! "/path/to/yellow_tripdata_2020-01.csv",
-//! CsvReadOptions::new(),
-//! )?;
-//!
-//! // execute the query
-//! let df = ctx.sql(
-//! "SELECT passenger_count, MIN(fare_amount), MAX(fare_amount),
AVG(fare_amount), SUM(fare_amount)
-//! FROM tripdata
-//! GROUP BY passenger_count
-//! ORDER BY passenger_count",
-//! )?;
-//!
-//! // collect the results and print them to stdout
-//! let results = df.collect().await?;
-//! pretty::print_batches(&results)?;
-//! Ok(())
-//! }
-//! ```
+#![doc = include_str!("../README.md")]
pub mod columnar_batch;
pub mod context;
diff --git a/ballista/rust/core/README.md b/ballista/rust/core/README.md
index d51ae2f..8d7edd0 100644
--- a/ballista/rust/core/README.md
+++ b/ballista/rust/core/README.md
@@ -17,6 +17,8 @@
under the License.
-->
-# Ballista - Rust
+# Ballista Core Library
-This crate contains the core Ballista types.
+This crate contains the Ballista core library which is used as a dependency by
the `ballista`,
+`ballista-scheduler`, and `ballista-executor` crates. Refer to
<https://crates.io/crates/ballista> for
+general Ballista documentation.
diff --git a/ballista/rust/core/src/lib.rs b/ballista/rust/core/src/lib.rs
index 614bf9a..9e892f5 100644
--- a/ballista/rust/core/src/lib.rs
+++ b/ballista/rust/core/src/lib.rs
@@ -15,12 +15,7 @@
// specific language governing permissions and limitations
// under the License.
-//! Ballista Core Library
-//!
-//! This crate contains the Ballista core library which is used as a
dependency by the ballista,
-//! ballista-scheduler, and ballista-executor crates. Refer to
<https://crates.io/crates/ballista> for
-//! general Ballista documentation.
-
+#![doc = include_str!("../README.md")]
#![allow(unused_imports)]
pub const BALLISTA_VERSION: &str = env!("CARGO_PKG_VERSION");
diff --git a/ballista/rust/executor/README.md b/ballista/rust/executor/README.md
index 105e027..731c2dc 100644
--- a/ballista/rust/executor/README.md
+++ b/ballista/rust/executor/README.md
@@ -17,16 +17,7 @@
under the License.
-->
-# Ballista Executor - Rust
+# Ballista Executor Process
-This crate contains the Ballista Executor. It can be used both as a library or
as a binary.
-
-## Run
-
-```bash
-RUST_LOG=info cargo run --release
-...
-[2021-02-11T05:30:13Z INFO executor] Running with config: ExecutorConfig {
host: "localhost", port: 50051, work_dir:
"/var/folders/y8/fc61kyjd4n53tn444n72rjrm0000gn/T/.tmpv1LjN0",
concurrent_tasks: 4 }
-```
-
-By default, the executor will bind to `localhost` and listen on port `50051`.
+This crate contains the Ballista executor process. Refer to
<https://crates.io/crates/ballista> for
+documentation.
diff --git a/ballista/rust/executor/src/lib.rs
b/ballista/rust/executor/src/lib.rs
index f2abf31..714698b 100644
--- a/ballista/rust/executor/src/lib.rs
+++ b/ballista/rust/executor/src/lib.rs
@@ -15,10 +15,7 @@
// specific language governing permissions and limitations
// under the License.
-//! Ballista Executor Process
-//!
-//! This crate contains the Ballista executor process. Refer to
<https://crates.io/crates/ballista> for
-//! documentation.
+#![doc = include_str!("../README.md")]
pub mod collect;
pub mod execution_loop;
diff --git a/ballista/rust/scheduler/README.md
b/ballista/rust/scheduler/README.md
index 78a8000..fbc35e4 100644
--- a/ballista/rust/scheduler/README.md
+++ b/ballista/rust/scheduler/README.md
@@ -17,38 +17,7 @@
under the License.
-->
-# Ballista Scheduler
+# Ballista Scheduler Process
-This crate contains the Ballista Scheduler. It can be used both as a library
or as a binary.
-
-## Run
-
-```bash
-$ RUST_LOG=info cargo run --release
-...
-[2021-02-11T05:29:30Z INFO scheduler] Ballista v0.4.2-SNAPSHOT Scheduler
listening on 0.0.0.0:50050
-[2021-02-11T05:30:13Z INFO ballista::scheduler] Received register_executor
request for ExecutorMetadata { id: "6d10f5d2-c8c3-4e0f-afdb-1f6ec9171321",
host: "localhost", port: 50051 }
-```
-
-By default, the scheduler will bind to `localhost` and listen on port `50051`.
-
-## Connecting to Scheduler
-
-Scheduler supports REST model also using content negotiation.
-For e.x if you want to get list of executors connected to the scheduler,
-you can do (assuming you use default config)
-
-```bash
-curl --request GET \
- --url http://localhost:50050/executors \
- --header 'Accept: application/json'
-```
-
-## Scheduler UI
-
-A basic ui for the scheduler is in `ui/scheduler` of the ballista repo.
-It can be started using the following [yarn](https://yarnpkg.com/) command
-
-```bash
-yarn && yarn start
-```
+This crate contains the Ballista scheduler process. Refer to
<https://crates.io/crates/ballista> for
+documentation.
diff --git a/ballista/rust/scheduler/src/lib.rs
b/ballista/rust/scheduler/src/lib.rs
index 676975f..b476b77 100644
--- a/ballista/rust/scheduler/src/lib.rs
+++ b/ballista/rust/scheduler/src/lib.rs
@@ -15,10 +15,7 @@
// specific language governing permissions and limitations
// under the License.
-//! Ballista Scheduler Process
-//!
-//! This crate contains the Ballista scheduler process. Refer to
<https://crates.io/crates/ballista> for
-//! documentation.
+#![doc = include_str!("../README.md")]
pub mod api;
pub mod planner;