jayzhan211 commented on issue #9972:
URL:
https://github.com/apache/arrow-datafusion/issues/9972#issuecomment-2047471679
@alamb
I plan to move out the code that computes `required_input_ordering` for
AggregateExec to the optimizer. Because that is the part contains
`get_aggregate_exprs_requirement`. I name the rule `simplify_ordering`.
The problem is that I need to ensure the rule `simplify_ordering` is applied
somewhere after `AggregateExec::try_new_with_schema` and before
`required_input_ordering` is required. Since many rules include
`AggregateExec::try_new_with_schema` I end up inserting `simplify_ordering`
after many rules. The worst is that the rule `EnforceDistribution` expects the
`simplify_ordering` be applied between the rule.
```rust
let adjusted = if top_down_join_key_reordering {
// Run a top-down process to adjust input key ordering
recursively
let plan_requirements =
PlanWithKeyRequirements::new_default(plan);
let adjusted = plan_requirements
.transform_down(&adjust_input_keys_ordering)
.data()?;
adjusted.plan
} else {
// Run a bottom-up process
plan.transform_up(&|plan| {
Ok(Transformed::yes(reorder_join_keys_to_inputs(plan)?))
})
.data()?
};
// Expect simply_ordering here before `ensure_distribution` is called.
let distribution_context =
DistributionContext::new_default(adjusted);
// Distribution enforcement needs to be applied bottom-up.
let distribution_context = distribution_context
.transform_up(&|distribution_context| {
ensure_distribution(distribution_context, config)
})
.data()?;
Ok(distribution_context.plan)
```
I need to split this rule into two, so I can insert the `simplify_ordering`
in between. And, I think there is something wrong 😢
The other approach is instead of having `simplify_ordering` as a single
rule, have it like a function for each rule, but I think either way is hard to
maintain, we still need to think about **where** to insert the rule.
Since the `required_input_ordering` is integrated deeply with
`AggregateExec` creation, pulling it out as a rule ends up we need to ensure
the rule is applied in the correct place, which seems like a bad design for me.
I have no good idea currently, leave this note here and see if you or others
have any good idea for it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]