askalt opened a new pull request, #19462:
URL: https://github.com/apache/datafusion/pull/19462

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   Closes https://github.com/apache/datafusion/issues/19351
   
   ## What changes are included in this PR?
   
   This patch introduces the stateless physical plan feature. Currently, the 
physical-plan crate is fully supported. This feature allows for the reuse of 
physical plans and their concurrent execution.
   
   The feature is implemented by adding a separate Cargo feature named 
"stateless_plan". The implementation consists of several parts:
   
   ### State tree.
   
   With the "stateless_plan" feature enabled, the plans themselves do not store 
state. The state is stored in a separate tree composed of PlanStateNodes, which 
is built lazily during plan execution. Each node of the tree stores not only 
the shared state of the plan but also its metrics. The shape of the state tree 
matches the shape of the execution plan tree.
   
   ### Metrics
   
   Metrics are stored in the nodes of the state tree and can be accessed after 
plan execution. Support is provided for performing EXPLAIN using the state.
   
   ### Dynamic Filters
   
   In the case of stateless plans, dynamic filters cannot simply be stored 
inside the plans, as the same plan can be executed concurrently. To overcome 
this, a dynamic filter is split into two parts: a planning-time version and an 
execution-time version. The plans contain the planning-time version, which is 
transformed into the execution version during the execution phase and then 
passed from parent nodes to child nodes using the state tree.
   
   ### WorkTable
   
   Instead of explicitly injecting the WorkTable into nodes, RecursiveExec 
exposes the WorkTable in the state stored within the State Tree. Then, a node 
interested in obtaining the WorkTable traverses up the State Tree and thus 
retrieves the current WorkTable.
   
   ## Are these changes tested?
   
   Currently only locally as the patch introduces a new isolated feature which 
is not tested in CI yet.
   
   ## Following work
   
   - Support stateless plan for all other DataFusion crates.
   - Enable running tests with this feature in CI.
   - Deprecate stateful plans to eventually transition completely to the 
stateless version.
   - Add `fmt_as_with_state` to allow plans to include state-specific details 
in the EXPLAIN output, such as dynamic filters.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to