Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

via GitHub Mon, 21 Apr 2025 09:08:05 -0700


rluvaton commented on PR #15022:
URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2818864086


   > Thank you @rluvaton. I had some difficulty to understand what does this PR 
actually solve. If you can share a real case to demonstrate how this order in 
metadata works in a real use case, it would greatly help in understanding the 
need for this change. AFAICS this path is only executed during spill scenarios 
at the moment.
   
   This expose to GroupAccumulator whether the group indices are sorted or not.
   
   This allow group accumulator to have specific optimization based on that for 
example only saving the current group state
   
   An optimization that can be made when the group indices are sorted is for 
example if you implement count distinct. If you know that once you no longer 
have a specific group you can clean the internal hash set that was used to 
track unique values in that group.
   
   > How does spilling disrupt the order, and how does this fix restore it? Are 
there any other use cases for this feature as well? 
   
   Spilling does not disrupt the order, actually when there are spill we go to 
merge phase and sort all the spill files into 1 sorted stream so we can now 
take advantage of that by adapting our implementation
   
   And this is not a fix but rather propagating some knowledge that the 
operator has
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

Reply via email to