ashdnazg opened a new pull request, #15924:
URL: https://github.com/apache/datafusion/pull/15924

   ## Which issue does this PR close?
   - Closes #15923.
   
   ## Rationale for this change
   
   When aggregating first/last list over a column of lists, the first/last
   accumulators hold the necessary scalar value as is, which points to the
   list in the original input buffer.
   
   This results in two issues:
   
   1) We prevent the deallocation of the input arrays which might be
   significantly larger than the single value we want to hold.
   
   2) During aggreagtion with groups, many accumulators receive slices of the
   same input buffer, resulting in all held values pointing to this buffer.
   Then, when calculating the size of all accumulators we count the buffer
   multiple times, since each accumulator considers it to be part of its own
   allocation.
   
   
   ## What changes are included in this PR?
   
   The PR copies/compacts scalar values held by the first/last operators, such 
that they no longer hold the entire input buffer, thereby solving both 1 & 2.
   
   ## Are these changes tested?
   
   Two tests are added.
   
   ## Are there any user-facing changes?
   
   There's a new `compact` function, which might or might not be desired to be 
`pub`. I thought it might be useful somewhere else, but YMMV.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to