askalt opened a new pull request, #19462: URL: https://github.com/apache/datafusion/pull/19462
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Closes https://github.com/apache/datafusion/issues/19351 ## What changes are included in this PR? This patch introduces the stateless physical plan feature. Currently, the physical-plan crate is fully supported. This feature allows for the reuse of physical plans and their concurrent execution. The feature is implemented by adding a separate Cargo feature named "stateless_plan". The implementation consists of several parts: ### State tree. With the "stateless_plan" feature enabled, the plans themselves do not store state. The state is stored in a separate tree composed of PlanStateNodes, which is built lazily during plan execution. Each node of the tree stores not only the shared state of the plan but also its metrics. The shape of the state tree matches the shape of the execution plan tree. ### Metrics Metrics are stored in the nodes of the state tree and can be accessed after plan execution. Support is provided for performing EXPLAIN using the state. ### Dynamic Filters In the case of stateless plans, dynamic filters cannot simply be stored inside the plans, as the same plan can be executed concurrently. To overcome this, a dynamic filter is split into two parts: a planning-time version and an execution-time version. The plans contain the planning-time version, which is transformed into the execution version during the execution phase and then passed from parent nodes to child nodes using the state tree. ### WorkTable Instead of explicitly injecting the WorkTable into nodes, RecursiveExec exposes the WorkTable in the state stored within the State Tree. Then, a node interested in obtaining the WorkTable traverses up the State Tree and thus retrieves the current WorkTable. ## Are these changes tested? Currently only locally as the patch introduces a new isolated feature which is not tested in CI yet. ## Following work - Support stateless plan for all other DataFusion crates. - Enable running tests with this feature in CI. - Deprecate stateful plans to eventually transition completely to the stateless version. - Add `fmt_as_with_state` to allow plans to include state-specific details in the EXPLAIN output, such as dynamic filters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
