ozankabak commented on PR #6034: URL: https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1522548414
> I'm not very familiar with the group by code anymore, but the use of per-group allocations does immediately stand out to me as at risk of thrashing the memory allocator, and consequently making the code not just slow but also wildly unpredictable. I'm aware this isn't something introduced by this PR, but revisiting this design (see https://github.com/apache/arrow-datafusion/issues/4973) may make it easier to make changes in this area without introducing regressions. I agree, we will be happy to take part in that effort. Going back to the scope of this PR: If everyone agrees, we will generate another changeset with Approach 2, and @alamb can check how the performance looks there (since our local benchmarks do not show a difference even now). We can then move forward with either that, or this version. Sounds good? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
