This is an automated email from the ASF dual-hosted git repository.

houqp pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new 7f3fe46  Improve Ballista crate README content (#878)
7f3fe46 is described below

commit 7f3fe466b8c4585d17509fb84c78738ebe909ded
Author: Andy Grove <[email protected]>
AuthorDate: Sun Aug 15 17:14:24 2021 -0600

    Improve Ballista crate README content (#878)
    
    * Improve Ballista crate README content
    
    * prettier
    
    * use doc include_str
    
    * format
---
 ballista/README.md                 |   3 +-
 ballista/rust/client/README.md     | 100 ++++++++++++++++++++++++++++++++++++-
 ballista/rust/client/src/lib.rs    |  98 +-----------------------------------
 ballista/rust/core/README.md       |   6 ++-
 ballista/rust/core/src/lib.rs      |   7 +--
 ballista/rust/executor/README.md   |  15 ++----
 ballista/rust/executor/src/lib.rs  |   5 +-
 ballista/rust/scheduler/README.md  |  37 ++------------
 ballista/rust/scheduler/src/lib.rs |   5 +-
 9 files changed, 113 insertions(+), 163 deletions(-)

diff --git a/ballista/README.md b/ballista/README.md
index eeb4273..1fc3bdb 100644
--- a/ballista/README.md
+++ b/ballista/README.md
@@ -37,8 +37,7 @@ redundancy in the case of a scheduler failing.
 
 # Getting Started
 
-Fully working examples are available. Refer to the [Ballista Examples 
README](../ballista-examples/README.md) for
-more information.
+Refer to the core [Ballista crate README](rust/client/README.md) for the 
Getting Started guide.
 
 ## Distributed Scheduler Overview
 
diff --git a/ballista/rust/client/README.md b/ballista/rust/client/README.md
index a9fbd8e..eb68e68 100644
--- a/ballista/rust/client/README.md
+++ b/ballista/rust/client/README.md
@@ -17,6 +17,102 @@
   under the License.
 -->
 
-# Ballista - Rust
+# Ballista: Distributed Scheduler for Apache Arrow DataFusion
 
-This crate contains the Ballista client library. For an example usage, please 
refer [here](../benchmarks/tpch/README.md).
+Ballista is a distributed compute platform primarily implemented in Rust, and 
powered by Apache Arrow and
+DataFusion. It is built on an architecture that allows other programming 
languages (such as Python, C++, and
+Java) to be supported as first-class citizens without paying a penalty for 
serialization costs.
+
+The foundational technologies in Ballista are:
+
+- [Apache Arrow](https://arrow.apache.org/) memory model and compute kernels 
for efficient processing of data.
+- [Apache Arrow Flight 
Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) 
for efficient
+  data transfer between processes.
+- [Google Protocol Buffers](https://developers.google.com/protocol-buffers) 
for serializing query plans.
+- [Docker](https://www.docker.com/) for packaging up executors along with 
user-defined code.
+
+Ballista can be deployed as a standalone cluster and also supports 
[Kubernetes](https://kubernetes.io/). In either
+case, the scheduler can be configured to use [etcd](https://etcd.io/) as a 
backing store to (eventually) provide
+redundancy in the case of a scheduler failing.
+
+## Starting a cluster
+
+There are numerous ways to start a Ballista cluster, including support for 
Docker and
+Kubernetes. For full documentation, refer to the
+[DataFusion User 
Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide)
+
+A simple way to start a local cluster for testing purposes is to use cargo to 
install
+the scheduler and executor crates.
+
+```bash
+cargo install ballista-scheduler
+cargo install ballista-executor
+```
+
+With these crates installed, it is now possible to start a scheduler process.
+
+```bash
+RUST_LOG=info ballista-scheduler
+```
+
+The scheduler will bind to port 50050 by default.
+
+Next, start an executor processes in a new terminal session with the specified 
concurrency
+level.
+
+```bash
+RUST_LOG=info ballista-executor -c 4
+```
+
+The executor will bind to port 50051 by default. Additional executors can be 
started by
+manually specifying a bind port. For example:
+
+```bash
+RUST_LOG=info ballista-executor --bind-port 50052 -c 4
+```
+
+## Executing a query
+
+Ballista provides a `BallistaContext` as a starting point for creating 
queries. DataFrames can be created
+by invoking the `read_csv`, `read_parquet`, and `sql` methods.
+
+The following example runs a simple aggregate SQL query against a CSV file 
from the
+[New York Taxi and Limousine 
Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
+data set.
+
+```rust,no_run
+use ballista::prelude::*;
+use datafusion::arrow::util::pretty;
+use datafusion::prelude::CsvReadOptions;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+   // create configuration
+   let config = BallistaConfig::builder()
+       .set("ballista.shuffle.partitions", "4")
+       .build()?;
+
+   // connect to Ballista scheduler
+   let ctx = BallistaContext::remote("localhost", 50050, &config);
+
+   // register csv file with the execution context
+   ctx.register_csv(
+       "tripdata",
+       "/path/to/yellow_tripdata_2020-01.csv",
+       CsvReadOptions::new(),
+   )?;
+
+   // execute the query
+   let df = ctx.sql(
+       "SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), 
AVG(fare_amount), SUM(fare_amount)
+       FROM tripdata
+       GROUP BY passenger_count
+       ORDER BY passenger_count",
+   )?;
+
+   // collect the results and print them to stdout
+   let results = df.collect().await?;
+   pretty::print_batches(&results)?;
+   Ok(())
+}
+```
diff --git a/ballista/rust/client/src/lib.rs b/ballista/rust/client/src/lib.rs
index 35bd12b..125278d 100644
--- a/ballista/rust/client/src/lib.rs
+++ b/ballista/rust/client/src/lib.rs
@@ -15,103 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Ballista is a distributed compute platform primarily implemented in Rust, 
and powered by Apache Arrow and
-//! DataFusion. It is built on an architecture that allows other programming 
languages (such as Python, C++, and
-//! Java) to be supported as first-class citizens without paying a penalty for 
serialization costs.
-//!
-//! The foundational technologies in Ballista are:
-//!
-//! - [Apache Arrow](https://arrow.apache.org/) memory model and compute 
kernels for efficient processing of data.
-//! - [Apache Arrow Flight 
Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) 
for efficient
-//!   data transfer between processes.
-//! - [Google Protocol 
Buffers](https://developers.google.com/protocol-buffers) for serializing query 
plans.
-//! - [Docker](https://www.docker.com/) for packaging up executors along with 
user-defined code.
-//!
-//! Ballista can be deployed as a standalone cluster and also supports 
[Kubernetes](https://kubernetes.io/). In either
-//! case, the scheduler can be configured to use [etcd](https://etcd.io/) as a 
backing store to (eventually) provide
-//! redundancy in the case of a scheduler failing.
-//!
-//! ## Starting a cluster
-//!
-//! There are numerous ways to start a Ballista cluster, including support for 
Docker and
-//! Kubernetes. For full documentation, refer to the
-//! [DataFusion User 
Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide)
-//!
-//! A simple way to start a local cluster for testing purposes is to use cargo 
to install
-//! the scheduler and executor crates.
-//!
-//! ```bash
-//! cargo install ballista-scheduler
-//! cargo install ballista-executor
-//! ```
-//!
-//! With these crates installed, it is now possible to start a scheduler 
process.
-//!
-//! ```bash
-//! RUST_LOG=info ballista-scheduler
-//! ```
-//!
-//! The scheduler will bind to port 50050 by default.
-//!
-//! Next, start an executor processes in a new terminal session with the 
specified concurrency
-//! level.
-//!
-//! ```bash
-//! RUST_LOG=info ballista-executor -c 4
-//! ```
-//!
-//! The executor will bind to port 50051 by default. Additional executors can 
be started by
-//! manually specifying a bind port. For example:
-//!
-//! ```bash
-//! RUST_LOG=info ballista-executor --bind-port 50052 -c 4
-//! ```
-//!
-//! ## Executing a query
-//!
-//! Ballista provides a `BallistaContext` as a starting point for creating 
queries. DataFrames can be created
-//! by invoking the `read_csv`, `read_parquet`, and `sql` methods.
-//!
-//! The following example runs a simple aggregate SQL query against a CSV file 
from the
-//! [New York Taxi and Limousine 
Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
-//! data set.
-//!
-//! ```no_run
-//! use ballista::prelude::*;
-//! use datafusion::arrow::util::pretty;
-//! use datafusion::prelude::CsvReadOptions;
-//!
-//! #[tokio::main]
-//! async fn main() -> Result<()> {
-//!    // create configuration
-//!    let config = BallistaConfig::builder()
-//!        .set("ballista.shuffle.partitions", "4")
-//!        .build()?;
-//!
-//!    // connect to Ballista scheduler
-//!    let ctx = BallistaContext::remote("localhost", 50050, &config);
-//!
-//!    // register csv file with the execution context
-//!    ctx.register_csv(
-//!        "tripdata",
-//!        "/path/to/yellow_tripdata_2020-01.csv",
-//!        CsvReadOptions::new(),
-//!    )?;
-//!
-//!    // execute the query
-//!    let df = ctx.sql(
-//!        "SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), 
AVG(fare_amount), SUM(fare_amount)
-//!        FROM tripdata
-//!        GROUP BY passenger_count
-//!        ORDER BY passenger_count",
-//!    )?;
-//!
-//!    // collect the results and print them to stdout
-//!    let results = df.collect().await?;
-//!    pretty::print_batches(&results)?;
-//!    Ok(())
-//! }
-//! ```
+#![doc = include_str!("../README.md")]
 
 pub mod columnar_batch;
 pub mod context;
diff --git a/ballista/rust/core/README.md b/ballista/rust/core/README.md
index d51ae2f..8d7edd0 100644
--- a/ballista/rust/core/README.md
+++ b/ballista/rust/core/README.md
@@ -17,6 +17,8 @@
   under the License.
 -->
 
-# Ballista - Rust
+# Ballista Core Library
 
-This crate contains the core Ballista types.
+This crate contains the Ballista core library which is used as a dependency by 
the `ballista`,
+`ballista-scheduler`, and `ballista-executor` crates. Refer to 
<https://crates.io/crates/ballista> for
+general Ballista documentation.
diff --git a/ballista/rust/core/src/lib.rs b/ballista/rust/core/src/lib.rs
index 614bf9a..9e892f5 100644
--- a/ballista/rust/core/src/lib.rs
+++ b/ballista/rust/core/src/lib.rs
@@ -15,12 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Ballista Core Library
-//!
-//! This crate contains the Ballista core library which is used as a 
dependency by the ballista,
-//! ballista-scheduler, and ballista-executor crates. Refer to 
<https://crates.io/crates/ballista> for
-//! general Ballista documentation.
-
+#![doc = include_str!("../README.md")]
 #![allow(unused_imports)]
 pub const BALLISTA_VERSION: &str = env!("CARGO_PKG_VERSION");
 
diff --git a/ballista/rust/executor/README.md b/ballista/rust/executor/README.md
index 105e027..731c2dc 100644
--- a/ballista/rust/executor/README.md
+++ b/ballista/rust/executor/README.md
@@ -17,16 +17,7 @@
   under the License.
 -->
 
-# Ballista Executor - Rust
+# Ballista Executor Process
 
-This crate contains the Ballista Executor. It can be used both as a library or 
as a binary.
-
-## Run
-
-```bash
-RUST_LOG=info cargo run --release
-...
-[2021-02-11T05:30:13Z INFO  executor] Running with config: ExecutorConfig { 
host: "localhost", port: 50051, work_dir: 
"/var/folders/y8/fc61kyjd4n53tn444n72rjrm0000gn/T/.tmpv1LjN0", 
concurrent_tasks: 4 }
-```
-
-By default, the executor will bind to `localhost` and listen on port `50051`.
+This crate contains the Ballista executor process. Refer to 
<https://crates.io/crates/ballista> for
+documentation.
diff --git a/ballista/rust/executor/src/lib.rs 
b/ballista/rust/executor/src/lib.rs
index f2abf31..714698b 100644
--- a/ballista/rust/executor/src/lib.rs
+++ b/ballista/rust/executor/src/lib.rs
@@ -15,10 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Ballista Executor Process
-//!
-//! This crate contains the Ballista executor process. Refer to 
<https://crates.io/crates/ballista> for
-//! documentation.
+#![doc = include_str!("../README.md")]
 
 pub mod collect;
 pub mod execution_loop;
diff --git a/ballista/rust/scheduler/README.md 
b/ballista/rust/scheduler/README.md
index 78a8000..fbc35e4 100644
--- a/ballista/rust/scheduler/README.md
+++ b/ballista/rust/scheduler/README.md
@@ -17,38 +17,7 @@
   under the License.
 -->
 
-# Ballista Scheduler
+# Ballista Scheduler Process
 
-This crate contains the Ballista Scheduler. It can be used both as a library 
or as a binary.
-
-## Run
-
-```bash
-$ RUST_LOG=info cargo run --release
-...
-[2021-02-11T05:29:30Z INFO  scheduler] Ballista v0.4.2-SNAPSHOT Scheduler 
listening on 0.0.0.0:50050
-[2021-02-11T05:30:13Z INFO  ballista::scheduler] Received register_executor 
request for ExecutorMetadata { id: "6d10f5d2-c8c3-4e0f-afdb-1f6ec9171321", 
host: "localhost", port: 50051 }
-```
-
-By default, the scheduler will bind to `localhost` and listen on port `50051`.
-
-## Connecting to Scheduler
-
-Scheduler supports REST model also using content negotiation.
-For e.x if you want to get list of executors connected to the scheduler,
-you can do (assuming you use default config)
-
-```bash
-curl --request GET \
-  --url http://localhost:50050/executors \
-  --header 'Accept: application/json'
-```
-
-## Scheduler UI
-
-A basic ui for the scheduler is in `ui/scheduler` of the ballista repo.
-It can be started using the following [yarn](https://yarnpkg.com/) command
-
-```bash
-yarn && yarn start
-```
+This crate contains the Ballista scheduler process. Refer to 
<https://crates.io/crates/ballista> for
+documentation.
diff --git a/ballista/rust/scheduler/src/lib.rs 
b/ballista/rust/scheduler/src/lib.rs
index 676975f..b476b77 100644
--- a/ballista/rust/scheduler/src/lib.rs
+++ b/ballista/rust/scheduler/src/lib.rs
@@ -15,10 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! Ballista Scheduler Process
-//!
-//! This crate contains the Ballista scheduler process. Refer to 
<https://crates.io/crates/ballista> for
-//! documentation.
+#![doc = include_str!("../README.md")]
 
 pub mod api;
 pub mod planner;

Reply via email to