mustafasrepo commented on issue #9011:
URL:
https://github.com/apache/arrow-datafusion/issues/9011#issuecomment-1914674622
As far as I can tell, physical plan produced is correct in terms of
partitioning requirements.
```
PartitionedAggregateExec, metrics=[]
CoalesceBatchesExec: target_batch_size=8192, metrics=[]
RepartitionExec: partitioning=Hash([project_id@0, user_id@1], 12),
input_partitions=12, metrics=[]
RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1,
metrics=[]
CoalescePartitionsExec, metrics=[]
ProjectionExec: expr=[project_id@0 as project_id, user_id@1 as
user_id, created_at@2 as created_at, event@4 as event], metrics=[]
CoalesceBatchesExec: target_batch_size=8192, metrics=[]
FilterExec: project_id@0 = 1 AND created_at@2 >=
1705419428144118000 AND created_at@2 <= 1706283428144118000 AND event@4 = 13,
metrics=[]
RepartitionExec: partitioning=RoundRobinBatch(12),
input_partitions=1, metrics=[]
ParquetExec: file_groups={1 group: [[file.parquet]]},
projection=[project_id, user_id, created_at, event_id, event, str_0],
predicate=project_id@0 = 1 AND created_at@2 >= 1705419428144118000 AND
created_at@2 <= 1706283428144118000 AND event@4 = 13,
pruning_predicate=project_id_min@0 <= 1 AND 1 <= project_id_max@1 AND
created_at_max@2 >= 1705419428144118000 AND created_at_min@3 <=
1706283428144118000 AND event_min@4 <= 13 AND 13 <= event_max@5,
metrics=[num_predicate_creation_errors=0]
```
where
`RepartitionExec: partitioning=Hash([project_id@0, user_id@1], 12),
input_partitions=12, metrics=[]`
is added to satisfy partitioning requirement of the
`PartitionedAggregateExec`. However, the plan is weird though. We should be
able to generate better plan, where intermediate `CoalescePartitionsExec`
removed.
Additionally we have an API `required_input_ordering`, this API should be
implemented if an executor depends on any ordering. I
`PartitionedAggregateExec` depends on any ordering (such as the order of its
partition expressions), you have to specify this dependency with this API. This
might be the root cause why `Sort` is removed from the `PhysicalPlan`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]