gabotechs opened a new issue, #20396: URL: https://github.com/apache/datafusion/issues/20396
### Is your feature request related to a problem or challenge? The main purpose of this issue is to gather information about whether there's appetite from the community for having the ability to provide arbitrary user-provided annotations in ExecutionPlans. Still not sure if it's a good idea or if there's a better alternative already, so a perfectly valid answer is that this doesn't fit here. The idea is: the same way we can introduce our own session-scoped extensions in `SessionConfig`, to be able to do something similar but scoped to individual `ExecutionPlan`s. While traversing, modifying or displaying `ExecutionPlan`s, there's some times the need for making per-node decisions based on domain-specific information, like for example in https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L45. ### Describe the solution you'd like Probably some methods in the `ExecutionPlan` trait for dealing with arbitrary annotations, something like: ```rust trait ExecutionPlan { /// Returns the user-provided annotations for this [ExecutionPlan]. Users can use annotations /// for injecting their own ExecutionPlan-scope information that can be used for making /// decisions while traversing or modifying the plan. fn annotations(&self) -> &[Arc<dyn Any>] { &[] } /// Adds a new user-provided annotation to this [ExecutionPlan]. fn with_annotation(&self, annotation: Arc<dyn Any>) { // Do nothing } } ``` But completely open to other ideas (again, a perfectly valid answer to this is that we should not do this). ### Describe alternatives you've considered Have custom `ExecutionPlan` implementations that wrap the original plans adding annotations on top. This is not too bad, but it can lead to verbose implementations and challenges with downcasting to specific plan types. ### Additional context In https://github.com/datafusion-contrib/datafusion-distributed, we have the need for annotating each individual node in the plan with information like: - how many workers are suitable to execute that node ([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L54)) - what is the computational cost of that node ([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/plan_annotator.rs#L60-L60)) - should the node be wrapped with a network boundary ([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/plan_annotator.rs#L24)) We rely on a custom `AnnotatedPlan` structure for this ([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L45-L55)), however, some users would like to be able to provide their own annotations so that the distributed planner can react to their instructions. The issue with relying on a custom structure rather than vanilla `ExecutionPlan`, is that we cannot thread it across different PhysicalOptimizerRules, forcing us to have just one big optimizer rule that deals with the custom structure internally ([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/distributed_physical_optimizer_rule.rs#L42-L42)), instead of having multiple, more composable ones that can use as more fined grained building blocks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
