gabotechs opened a new issue, #20396:
URL: https://github.com/apache/datafusion/issues/20396

   ### Is your feature request related to a problem or challenge?
   
   The main purpose of this issue is to gather information about whether 
there's appetite from the community for having the ability to provide arbitrary 
user-provided annotations in ExecutionPlans.
   
   Still not sure if it's a good idea or if there's a better alternative 
already, so a perfectly valid answer is that this doesn't fit here.
   
   The idea is: the same way we can introduce our own session-scoped extensions 
in `SessionConfig`, to be able to do something similar but scoped to individual 
`ExecutionPlan`s. 
   
   While traversing, modifying or displaying `ExecutionPlan`s, there's some 
times the need for making per-node decisions based on domain-specific 
information, like for example in 
https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L45.
   
   ### Describe the solution you'd like
   
   Probably some methods in the `ExecutionPlan` trait for dealing with 
arbitrary annotations, something like:
   
   ```rust
   trait ExecutionPlan {
   
       /// Returns the user-provided annotations for this [ExecutionPlan]. 
Users can use annotations
       /// for injecting their own ExecutionPlan-scope information that can be 
used for making
       /// decisions while traversing or modifying the plan.
       fn annotations(&self) -> &[Arc<dyn Any>] {
           &[]
       }
   
       /// Adds a new user-provided annotation to this [ExecutionPlan].
       fn with_annotation(&self, annotation: Arc<dyn Any>) {
           // Do nothing
       }
   }
   ```
   
   But completely open to other ideas (again, a perfectly valid answer to this 
is that we should not do this).
   
   ### Describe alternatives you've considered
   
   Have custom `ExecutionPlan` implementations that wrap the original plans 
adding annotations on top.
   
   This is not too bad, but it can lead to verbose implementations and 
challenges with downcasting to specific plan types.
   
   ### Additional context
   
   In https://github.com/datafusion-contrib/datafusion-distributed, we have the 
need for annotating each individual node in the plan with information like:
   - how many workers are suitable to execute that node 
([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L54))
   - what is the computational cost of that node 
([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/plan_annotator.rs#L60-L60))
   - should the node be wrapped with a network boundary 
([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/plan_annotator.rs#L24))
   
   We rely on a custom `AnnotatedPlan` structure for this 
([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/main/src/distributed_planner/plan_annotator.rs#L45-L55)),
 however, some users would like to be able to provide their own annotations so 
that the distributed planner can react to their instructions.
   
   The issue with relying on a custom structure rather than vanilla 
`ExecutionPlan`, is that we cannot thread it across different 
PhysicalOptimizerRules, forcing us to have just one big optimizer rule that 
deals with the custom structure internally 
([link](https://github.com/datafusion-contrib/datafusion-distributed/blob/45aeb0b851ceb8be6a329da5e6a663c7358cd44a/src/distributed_planner/distributed_physical_optimizer_rule.rs#L42-L42)),
 instead of having multiple, more composable ones that can use as more fined 
grained building blocks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to