Weston Pace created ARROW-16522:
-----------------------------------
Summary: [C++] Evolution of exec plan
Key: ARROW-16522
URL: https://issues.apache.org/jira/browse/ARROW-16522
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
This is a high level JIRA that will be broken into a number of subtasks. There
are a number of things that each node has to reinvent (or, more likely,
copy/paste) and it we probably know enough at this point that we can enhance
the ExecNode/ExecPlan API a bit to move some of the burden out of the nodes and
into the plan. At a high level:
* All nodes that schedule tasks use an async task group so they can make sure
that all scheduled tasks are finished before the node is marked finish. Task
scheduling could happen at the plan level and there would be one spot keeping
track of running tasks.
* If we have centralized scheduling then that opens up the door for a
centralized handling of errors and backpressure. A node's default error
behavior is then "stop scheduling tasks" and a node's default backpressure
behavior is "stop running tasks".
* There are no nodes that emit to multiple outputs today. Sending data to
multiple outputs is not simple. This is a pipeline breaking activity and new
tasks will need to be scheduled and the data will need to be held onto until
each downstream task has run, and backpressure may need to be applied if one
output is slower than the other. Node's should not be given the choice of
multiple outputs unless they are solving specifically that problem (e.g.
temporary table).
--
This message was sent by Atlassian Jira
(v8.20.7#820007)