tustvold commented on PR #6034:
URL: 
https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1522449963

   > to prevent unnecessary empty vector creation
   
   FWIW an empty vector just contains a `NonNull::dangling`, it doesn't 
allocate anything, so wrapping it in an `Option` is redundant. That being said 
the additions to `GroupState` will make it non-trivially larger, which could 
conceivably cause performance regressions.
   
   I'm not very familiar with the group by code anymore, but the use of 
per-group allocations does immediately stand out to me as at risk of thrashing 
the memory allocator, and consequently making the code not just slow but also 
wildly unpredictable. I'm aware this isn't something introduced by this PR, but 
revisiting this design (see #4973) may make it easier to make changes in this 
area without introducing regressions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to