alamb commented on issue #6937:
URL:
https://github.com/apache/arrow-datafusion/issues/6937#issuecomment-1681317933
> Doing that reduces the memory usage, but often with higher cost, which can
be seen in the benchmark:
Maybe we can get the performance back somehow (like make the output creation
faster somehow) 🤔
Alternately, we could consider making a single group operator that does the
two phase grouping within itself
so instead of
```
group by (final)
repartition
group by (initil)
```
We would have
```
group by
```
And do the repartitioning within the operator itself (and thus if the first
phase isn't helping, we can switch to the second phase)
This might impact downstream projects like ballista that want to distribute
the first phase, however 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]