mingmwang commented on code in PR #4455:
URL: https://github.com/apache/arrow-datafusion/pull/4455#discussion_r1037173154
##########
datafusion/core/src/physical_plan/windows/window_agg_exec.rs:
##########
@@ -116,14 +150,38 @@ impl ExecutionPlan for WindowAggExec {
/// Get the output partitioning of this plan
fn output_partitioning(&self) -> Partitioning {
- // because we can have repartitioning using the partition keys
- // this would be either 1 or more than 1 depending on the presense of
- // repartitioning
- self.input.output_partitioning()
+ // Although WindowAggExec does not change the output partitioning from
the input, but can not return the output partitioning
+ // from the input directly, need to adjust the column index to align
with the new schema.
+ let window_expr_len = self.window_expr.len();
+ let input_partitioning = self.input.output_partitioning();
+ match input_partitioning {
+ Partitioning::RoundRobinBatch(size) =>
Partitioning::RoundRobinBatch(size),
+ Partitioning::UnknownPartitioning(size) => {
+ Partitioning::UnknownPartitioning(size)
+ }
+ Partitioning::Hash(exprs, size) => {
+ let new_exprs = exprs
+ .into_iter()
+ .map(|expr| {
+ expr.transform_down(
Review Comment:
Sure, if I understand correctly, it is very similar to choose whether to use
SortMergeJoin or HashJoin as the physical join implementation. If the input
plan can produce required ordering, then prefer SortMergeJoin. Such kind of
optimization is usually done in a rule.
You can also implement this rule in a more cost based matter. For each
WindowAggExec, you can either choose to
use WindowAggExec or your pipelined version, pipeline version have
requirements on output ordering, the Enforcement rule will add SortExec if the
input plan ordering can not satisfy. At the end, count the number of SortExecs
in the tree, the tree plan which has the least number of the SortExecs is the
best plan. Or you can also tolerant with 1 or 2 additional SortExec and still
prefer the pipelined version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]