This is an automated email from the ASF dual-hosted git repository.
jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new 1702d6c Make it easer for developers to find Ballista documentation
(#330)
1702d6c is described below
commit 1702d6c85ebfdbc968b1dc427a9799e74b64ff96
Author: Andy Grove <[email protected]>
AuthorDate: Fri May 14 15:03:53 2021 -0600
Make it easer for developers to find Ballista documentation (#330)
---
DEVELOPERS.md | 3 ++
README.md | 3 ++
ballista/README.md | 15 +++---
ballista/docs/README.md | 7 +--
ballista/docs/{dev-env-rust.md => dev-env.md} | 0
ballista/docs/integration-testing.md | 10 ++--
ballista/docs/release-process.md | 68 ---------------------------
ballista/docs/rust-docker.md | 66 --------------------------
8 files changed, 18 insertions(+), 154 deletions(-)
diff --git a/DEVELOPERS.md b/DEVELOPERS.md
index 1dc9304..be8bb61 100644
--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -21,6 +21,9 @@
This section describes how you can get started at developing DataFusion.
+For information on developing with Ballista, see the
+[Ballista developer documentation](ballista/docs/README.md).
+
### Bootstrap environment
DataFusion is written in Rust and it uses a standard rust toolkit:
diff --git a/README.md b/README.md
index ded264a..f72c73b 100644
--- a/README.md
+++ b/README.md
@@ -30,6 +30,9 @@ logical query plans as well as a query optimizer and
execution engine
capable of parallel execution against partitioned data sources (CSV
and Parquet) using threads.
+DataFusion also supports distributed query execution via the
+[Ballista](ballista/README.md) crate.
+
## Use Cases
DataFusion is used to create modern, fast and efficient data
diff --git a/ballista/README.md b/ballista/README.md
index 288386f..276af3c 100644
--- a/ballista/README.md
+++ b/ballista/README.md
@@ -50,15 +50,14 @@ Although Ballista is largely inspired by Apache Spark,
there are some key differ
- The use of Apache Arrow as the memory model and network protocol means that
data can be exchanged between executors
in any programming language with minimal serialization overhead.
-# Status
+## Status
-The Ballista project was donated to Apache Arrow in April 2021 and work is
underway to integrate more tightly with
-DataFusion.
-
-One of the goals is to implement a common scheduler that can seamlessly scale
queries across cores in DataFusion and
-across nodes in Ballista.
-
-Ballista issues are tracked in ASF JIRA
[here](https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20component%20%3D%20%22Rust%20-%20Ballista%22)
+Ballista was
[donated](https://arrow.apache.org/blog/2021/04/12/ballista-donation/) to the
Apache Arrow project in
+April 2021 and should be considered experimental.
+## Getting Started
+The [Ballista Developer Documentation](docs/README.md) and the
+[DataFusion User
Guide](https://github.com/apache/arrow-datafusion/tree/master/docs/user-guide)
are currently the
+best sources of information for getting started with Ballista.
diff --git a/ballista/docs/README.md b/ballista/docs/README.md
index 44c831d..6588c1d 100644
--- a/ballista/docs/README.md
+++ b/ballista/docs/README.md
@@ -20,7 +20,7 @@
This directory contains documentation for developers that are contributing to
Ballista. If you are looking for
end-user documentation for a published release, please start with the
-[Ballista User Guide](https://ballistacompute.org/docs/) instead.
+[DataFusion User Guide](../../docs/user-guide) instead.
## Architecture & Design
@@ -29,9 +29,6 @@ end-user documentation for a published release, please start
with the
## Build, Test, Release
-- Setting up a [Rust development environment](dev-env-rust.md).
-- Setting up a [Java development environment](dev-env-jvm.md).
-- Notes on building [Rust docker images](rust-docker.md)
+- Setting up a [development environment](dev-env.md).
- [Integration Testing](integration-testing.md)
-- [Release process](release-process.md)
diff --git a/ballista/docs/dev-env-rust.md b/ballista/docs/dev-env.md
similarity index 100%
rename from ballista/docs/dev-env-rust.md
rename to ballista/docs/dev-env.md
diff --git a/ballista/docs/integration-testing.md
b/ballista/docs/integration-testing.md
index 2a979b6..3f818a4 100644
--- a/ballista/docs/integration-testing.md
+++ b/ballista/docs/integration-testing.md
@@ -18,15 +18,11 @@
-->
# Integration Testing
-Ballista has a [benchmark
crate](https://github.com/ballista-compute/ballista/tree/main/rust/benchmarks/tpch)
which is
-derived from TPC-H and this is currently the main form of integration testing.
+We use the [DataFusion
Benchmarks](https://github.com/apache/arrow-datafusion/tree/master/benchmarks)
for integration
+testing.
-The following command can be used to run the integration tests.
+The integration tests can be executed by running the following command from
the root of the DataFusion repository.
```bash
./dev/integration-tests.sh
```
-
-Please refer to the
-[benchmark
documentation](https://github.com/ballista-compute/ballista/blob/main/rust/benchmarks/tpch/README.md)
-for more information.
diff --git a/ballista/docs/release-process.md b/ballista/docs/release-process.md
deleted file mode 100644
index c6c45c3..0000000
--- a/ballista/docs/release-process.md
+++ /dev/null
@@ -1,68 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-# Release Process
-
-These instructions are for project maintainers wishing to create public
releases of Ballista.
-
-- Create a `release-0.4` branch or merge latest from `main` into an existing
`release-0.4` branch.
-- Update version numbers using `./dev/bump-version.sh`
-- Run integration tests with `./dev/integration-tests.sh`
-- Push changes
-- Create `v0.4.x` release tag from the `release-0.4` branch
-- Publish Docker images
-- Publish crate if possible (if we're using a published version of Arrow)
-
-## Publishing Java artifacts to Maven Central
-
-The JVM artifacts are published to Maven central by uploading to sonatype. You
will need to set the environment
-variables `SONATYPE_USERNAME` and `SONATYPE_PASSWORD` to the correct values
for your account and you will also need
-verified GPG keys available for signing the artifacts (instructions tbd).
-
-Run the follow commands to publish the artifacts to a sonatype staging
repository.
-
-```bash
-./dev/publish-jvm.sh
-```
-
-## Publishing Rust Artifacts
-
-Run the following script to publish the Rust crate to crates.io.
-
-```
-./dev/publish-rust.sh
-```
-
-## Publishing Docker Images
-
-Run the following script to publish the executor Docker images to Docker Hub.
-
-```
-./dev/publish-docker-images.sh
-```
-
-## GPG Notes
-
-Refer to [this
article](https://help.github.com/en/github/authenticating-to-github/generating-a-new-gpg-key)
for
-instructions on setting up GPG keys. Some useful commands are:
-
-```bash
-gpg --full-generate-key
-gpg --export-secret-keys > ~/.gnupg/secring.gpg
-gpg --key-server keys.openpgp.org --send-keys KEYID
-```
\ No newline at end of file
diff --git a/ballista/docs/rust-docker.md b/ballista/docs/rust-docker.md
deleted file mode 100644
index 0b94a14..0000000
--- a/ballista/docs/rust-docker.md
+++ /dev/null
@@ -1,66 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
--->
-### How to build rust's docker image
-
-To build the docker image in development, use
-
-```
-docker build -f docker/rust.dockerfile -t ballistacompute/ballista-rust:latest
.
-```
-
-This uses a multi-stage build, on which the build stage is called `builder`.
-Our github has this target cached, that we use to speed-up the build time:
-
-```
-export
BUILDER_IMAGE=docker.pkg.github.com/ballista-compute/ballista/ballista-rust-builder:main
-
-docker login docker.pkg.github.com -u ... -p ... # a personal access token to
read from the read:packages
-docker pull $BUILDER_IMAGE
-
-docker build --cache-from $BUILDER_IMAGE -f docker/rust.dockerfile -t
ballista:latest .
-```
-
-will build the image by re-using a cached image.
-
-### Docker images for development
-
-This project often requires testing on kubernetes. For this reason, we have a
github workflow to push images to
-github's registry, both from this repo and its forks.
-
-The basic principle is that every push to a git reference builds and publishes
a docker image.
-Specifically, given a branch or tag `${REF}`,
-
-* `docker.pkg.github.com/ballista-compute/ballista/ballista-rust:${REF}` is
the latest image from $REF
-* `docker.pkg.github.com/${USER}/ballista/ballista-rust:${REF}` is the latest
image from $REF on your fork
-
-To pull them from a kubernetes cluster or your computer, you need to have a
personal access token with scope `read:packages`,
-and login to the registry `docker.pkg.github.com`.
-
-The builder image - the large image with all the cargo caches - is available
on the same registry as described above, and is also
-available in all forks and for all references.
-
-Please refer to the [rust workflow](.github/workflows/rust.yaml) and [rust
dockerfile](docker/rust.dockerfile) for details on how we build and publish
these images.
-
-### Get the binary
-
-If you do not aim to run this in docker but any linux-based machine, you can
get the latest binary from a docker image on the registry: the binary is
statically linked and thus runs on any linux-based machine. You can get it using
-
-```
-id=$(docker create $BUILDER_IMAGE) && docker cp $id:/executor executor &&
docker rm -v $id
-```