This is an automated email from the ASF dual-hosted git repository.
jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new bc4a82fd5b ARROW-16626: [C++] Name the C++ streaming execution engine
bc4a82fd5b is described below
commit bc4a82fd5b65d90e97b773ca728442f369eb9951
Author: Weston Pace <[email protected]>
AuthorDate: Wed Jun 1 17:26:14 2022 -0500
ARROW-16626: [C++] Name the C++ streaming execution engine
Closes #13207 from westonpace/feature/ARROW-16626--name-query-engine
Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
---
docs/source/cpp/overview.rst | 3 +++
docs/source/cpp/streaming_execution.rst | 39 +++++++++++++++++----------------
2 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/docs/source/cpp/overview.rst b/docs/source/cpp/overview.rst
index ccebdba45d..33f075bd18 100644
--- a/docs/source/cpp/overview.rst
+++ b/docs/source/cpp/overview.rst
@@ -66,6 +66,9 @@ reference.
**Kernels** are specialized computation functions running in a loop over a
given set of datums representing input and output parameters to the functions.
+**Acero** (pronounced [aˈsɜɹo] / ah-SERR-oh) is a streaming execution engine
that allows
+computation to be expressed as a graph of operators which can transform
streams of data.
+
The IO layer
------------
diff --git a/docs/source/cpp/streaming_execution.rst
b/docs/source/cpp/streaming_execution.rst
index 649968ad43..7ce25f587d 100644
--- a/docs/source/cpp/streaming_execution.rst
+++ b/docs/source/cpp/streaming_execution.rst
@@ -19,14 +19,13 @@
.. highlight:: cpp
.. cpp:namespace:: arrow::compute
-==========================
-Streaming execution engine
-==========================
+=======================================
+Acero: A C++ streaming execution engine
+=======================================
.. warning::
- The streaming execution engine is experimental, and a stable API
- is not yet guaranteed.
+ Acero is experimental and a stable API is not yet guaranteed.
Motivation
==========
@@ -35,20 +34,23 @@ For many complex computations, successive direct
:ref:`invocation of
compute functions <invoking-compute-functions>` is not feasible
in either memory or computation time. Doing so causes all intermediate
data to be fully materialized. To facilitate arbitrarily large inputs
-and more efficient resource usage, Arrow also provides a streaming query
-engine with which computations can be formulated and executed.
+and more efficient resource usage, the Arrow C++ implementation also
+provides Acero, a streaming query engine with which computations can
+be formulated and executed.
.. image:: simple_graph.svg
:alt: An example graph of a streaming execution workflow.
-:class:`ExecNode` is provided to reify the graph of operations in a query.
-Batches of data (:struct:`ExecBatch`) flow along edges of the graph from
-node to node. Structuring the API around streams of batches allows the
-working set for each node to be tuned for optimal performance independent
-of any other nodes in the graph. Each :class:`ExecNode` processes batches
-as they are pushed to it along an edge of the graph by upstream nodes
-(its inputs), and pushes batches along an edge of the graph to downstream
-nodes (its outputs) as they are finalized.
+Acero allows computation to be expressed as an "execution plan"
+(:class:`ExecPlan`) which is a directed graph of operators. Each operator
+(:class:`ExecNode`) provides, transforms, or consumes the data passing
+through it. Batches of data (:struct:`ExecBatch`) flow along edges of
+the graph from node to node. Structuring the API around streams of batches
+allows the working set for each node to be tuned for optimal performance
+independent of any other nodes in the graph. Each :class:`ExecNode`
+processes batches as they are pushed to it along an edge of the graph by
+upstream nodes (its inputs), and pushes batches along an edge of the graph
+to downstream nodes (its outputs) as they are finalized.
.. seealso::
@@ -366,10 +368,9 @@ This function might be reading a file, iterating through
an in memory structure,
from a network connection. The arrow library refers to these functions as
``arrow::AsyncGenerator``
and there are a number of utilities for working with these functions. For
this example we use
a vector of record batches that we've already stored in memory.
-In addition, the schema of the data must be known up front. Arrow's streaming
execution
-engine must know the schema of the data at each stage of the execution graph
before any
-processing has begun. This means we must supply the schema for a source node
separately
-from the data itself.
+In addition, the schema of the data must be known up front. Acero must know
the schema of the data
+at each stage of the execution graph before any processing has begun. This
means we must supply the
+schema for a source node separately from the data itself.
Here we define a struct to hold the data generator definition. This includes
in-memory batches, schema
and a function that serves as a data generator :