[
https://issues.apache.org/jira/browse/ARROW-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17283458#comment-17283458
]
Andy Grove commented on ARROW-11606:
------------------------------------
[~jorgecarleitao] We could use your guidance here if you have time
> [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
> ---------------------------------------------------------------------
>
> Key: ARROW-11606
> URL: https://issues.apache.org/jira/browse/ARROW-11606
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust - DataFusion
> Reporter: Andy Grove
> Priority: Major
>
> We have run into an issue in the Ballista project where we are reconstructing
> the Final and Partial HashAggregateExec operators [1] for distributed
> execution and we need some guidance.
> The Partial HashAggregateExec gets created OK and executes correctly.
> However, when we create the Final HashAggregateExec, it is not finding the
> expected schema in the input operator. The partial exec outputs field names
> ending with "[sum]" and "[count]" and so on but the final aggregate doesn't
> seem to be looking for those names.
> It is also worth noting that the Final and Partial executors are not
> connected directly in this usage.
> The Partial exec is executed and output streamed to disk.
> The Final exec then runs against the output from the Partial exec.
> We may need to make changes in DataFusion to allow other crates to support
> this kind of use case?
> [1] https://github.com/ballista-compute/ballista/pull/491
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)