ashdnazg opened a new pull request, #15924: URL: https://github.com/apache/datafusion/pull/15924
## Which issue does this PR close? - Closes #15923. ## Rationale for this change When aggregating first/last list over a column of lists, the first/last accumulators hold the necessary scalar value as is, which points to the list in the original input buffer. This results in two issues: 1) We prevent the deallocation of the input arrays which might be significantly larger than the single value we want to hold. 2) During aggreagtion with groups, many accumulators receive slices of the same input buffer, resulting in all held values pointing to this buffer. Then, when calculating the size of all accumulators we count the buffer multiple times, since each accumulator considers it to be part of its own allocation. ## What changes are included in this PR? The PR copies/compacts scalar values held by the first/last operators, such that they no longer hold the entire input buffer, thereby solving both 1 & 2. ## Are these changes tested? Two tests are added. ## Are there any user-facing changes? There's a new `compact` function, which might or might not be desired to be `pub`. I thought it might be useful somewhere else, but YMMV. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org