jayzhan211 commented on issue #9972:
URL: 
https://github.com/apache/arrow-datafusion/issues/9972#issuecomment-2047471679

   @alamb 
   
   I plan to move out the code that computes `required_input_ordering` for 
AggregateExec to the optimizer. Because that is the part contains 
`get_aggregate_exprs_requirement`. I name the rule `simplify_ordering`. 
   
   The problem is that I need to ensure the rule `simplify_ordering` is applied 
somewhere after `AggregateExec::try_new_with_schema` and before 
`required_input_ordering` is required. Since many rules include 
`AggregateExec::try_new_with_schema` I end up inserting `simplify_ordering` 
after many rules. The worst is that the rule `EnforceDistribution` expects the 
`simplify_ordering` be applied between the rule.
   
   ```rust
           let adjusted = if top_down_join_key_reordering {
               // Run a top-down process to adjust input key ordering 
recursively
               let plan_requirements = 
PlanWithKeyRequirements::new_default(plan);
               let adjusted = plan_requirements
                   .transform_down(&adjust_input_keys_ordering)
                   .data()?;
               adjusted.plan
           } else {
               // Run a bottom-up process
               plan.transform_up(&|plan| {
                   Ok(Transformed::yes(reorder_join_keys_to_inputs(plan)?))
               })
               .data()?
           };
   
         // Expect simply_ordering here before `ensure_distribution` is called.
   
           let distribution_context = 
DistributionContext::new_default(adjusted);
           // Distribution enforcement needs to be applied bottom-up.
           let distribution_context = distribution_context
               .transform_up(&|distribution_context| {
                   ensure_distribution(distribution_context, config)
               })
               .data()?;
           Ok(distribution_context.plan)
   ```
   
   I need to split this rule into two, so I can insert the `simplify_ordering` 
in between. And, I think there is something wrong 😢 
   
   The other approach is instead of having `simplify_ordering` as a single 
rule, have it like a function for each rule, but I think either way is hard to 
maintain, we still need to think about **where** to insert the rule.
   
   
   Since the `required_input_ordering` is integrated deeply with 
`AggregateExec` creation, pulling it out as a rule ends up we need to ensure 
the rule is applied in the correct place, which seems like a bad design for me. 
   
   I have no good idea currently, leave this note here and see if you or others 
have any good idea for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to