This is an automated email from the ASF dual-hosted git repository.

jonkeane pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new bc4a82fd5b ARROW-16626: [C++] Name the C++ streaming execution engine
bc4a82fd5b is described below

commit bc4a82fd5b65d90e97b773ca728442f369eb9951
Author: Weston Pace <[email protected]>
AuthorDate: Wed Jun 1 17:26:14 2022 -0500

    ARROW-16626: [C++] Name the C++ streaming execution engine
    
    Closes #13207 from westonpace/feature/ARROW-16626--name-query-engine
    
    Lead-authored-by: Weston Pace <[email protected]>
    Co-authored-by: Will Jones <[email protected]>
    Co-authored-by: Jonathan Keane <[email protected]>
    Signed-off-by: Jonathan Keane <[email protected]>
---
 docs/source/cpp/overview.rst            |  3 +++
 docs/source/cpp/streaming_execution.rst | 39 +++++++++++++++++----------------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/docs/source/cpp/overview.rst b/docs/source/cpp/overview.rst
index ccebdba45d..33f075bd18 100644
--- a/docs/source/cpp/overview.rst
+++ b/docs/source/cpp/overview.rst
@@ -66,6 +66,9 @@ reference.
 **Kernels** are specialized computation functions running in a loop over a
 given set of datums representing input and output parameters to the functions.
 
+**Acero** (pronounced [aˈsɜɹo] / ah-SERR-oh) is a streaming execution engine 
that allows
+computation to be expressed as a graph of operators which can transform 
streams of data.
+
 The IO layer
 ------------
 
diff --git a/docs/source/cpp/streaming_execution.rst 
b/docs/source/cpp/streaming_execution.rst
index 649968ad43..7ce25f587d 100644
--- a/docs/source/cpp/streaming_execution.rst
+++ b/docs/source/cpp/streaming_execution.rst
@@ -19,14 +19,13 @@
 .. highlight:: cpp
 .. cpp:namespace:: arrow::compute
 
-==========================
-Streaming execution engine
-==========================
+=======================================
+Acero: A C++ streaming execution engine
+=======================================
 
 .. warning::
 
-    The streaming execution engine is experimental, and a stable API
-    is not yet guaranteed.
+    Acero is experimental and a stable API is not yet guaranteed.
 
 Motivation
 ==========
@@ -35,20 +34,23 @@ For many complex computations, successive direct 
:ref:`invocation of
 compute functions <invoking-compute-functions>` is not feasible
 in either memory or computation time. Doing so causes all intermediate
 data to be fully materialized. To facilitate arbitrarily large inputs
-and more efficient resource usage, Arrow also provides a streaming query
-engine with which computations can be formulated and executed.
+and more efficient resource usage, the Arrow C++ implementation also
+provides Acero, a streaming query engine with which computations can
+be formulated and executed.
 
 .. image:: simple_graph.svg
    :alt: An example graph of a streaming execution workflow.
 
-:class:`ExecNode` is provided to reify the graph of operations in a query.
-Batches of data (:struct:`ExecBatch`) flow along edges of the graph from
-node to node. Structuring the API around streams of batches allows the
-working set for each node to be tuned for optimal performance independent
-of any other nodes in the graph. Each :class:`ExecNode` processes batches
-as they are pushed to it along an edge of the graph by upstream nodes
-(its inputs), and pushes batches along an edge of the graph to downstream
-nodes (its outputs) as they are finalized.
+Acero allows computation to be expressed as an "execution plan"
+(:class:`ExecPlan`) which is a directed graph of operators.  Each operator
+(:class:`ExecNode`) provides, transforms, or consumes the data passing
+through it.  Batches of data (:struct:`ExecBatch`) flow along edges of
+the graph from node to node. Structuring the API around streams of batches
+allows the working set for each node to be tuned for optimal performance
+independent of any other nodes in the graph. Each :class:`ExecNode`
+processes batches as they are pushed to it along an edge of the graph by
+upstream nodes (its inputs), and pushes batches along an edge of the graph
+to downstream nodes (its outputs) as they are finalized.
 
 .. seealso::
 
@@ -366,10 +368,9 @@ This function might be reading a file, iterating through 
an in memory structure,
 from a network connection.  The arrow library refers to these functions as 
``arrow::AsyncGenerator``
 and there are a number of utilities for working with these functions.  For 
this example we use 
 a vector of record batches that we've already stored in memory.
-In addition, the schema of the data must be known up front.  Arrow's streaming 
execution
-engine must know the schema of the data at each stage of the execution graph 
before any
-processing has begun.  This means we must supply the schema for a source node 
separately
-from the data itself.
+In addition, the schema of the data must be known up front.  Acero must know 
the schema of the data
+at each stage of the execution graph before any processing has begun.  This 
means we must supply the
+schema for a source node separately from the data itself.
 
 Here we define a struct to hold the data generator definition. This includes 
in-memory batches, schema
 and a function that serves as a data generator :

Reply via email to