askalt commented on PR #19462: URL: https://github.com/apache/datafusion/pull/19462#issuecomment-3744263074
> I agree we would need some way / API in the dynamic predicates to update the pointers to the new operator instance, as part of running the plan again 🤔 Yes, and it works the same way for `WorkTable` from `RecursiveExec`. A table should also be re-created for each execution. The suggested approach solves all these challenges by placing such state within the state tree itself. > However, I am concerned that the practical ability to actually migrate the codebase (and all the consumers of DataFusion) to this pattern. Currently, the customer should take the following steps (for each `ExecutionPlan`) to migrate to the stateless plans suggested in the patch: 1) If the plan stores metrics, remove them and use `state.get_metrics(...)` within `execute(...)`. 2) If the plan stores state, create a separate structure `MyExecState`, move the state into it, implement `as_any(...)` (and possibly `dynamic_filters(...)`), and then acquire and use this state via `state.get_state::<MyExecState>()` in `execute(...)`. As an example, `HashJoinExec`: https://github.com/askalt/datafusion/blob/4275e0264a61fac347530b6a393ef7114a2f8767/datafusion/physical-plan/src/joins/hash_join/exec.rs#L395-L408 Regarding the DF crates -- I can migrate them myself (and in fact, I have already done so in a separate branch). Do you feel this is still an excessive number of changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
