Dandandan commented on PR #15985:
URL: https://github.com/apache/datafusion/pull/15985#issuecomment-2863220782

   This gets a small performance boost on clickbench query 9 (~9% on my end).
   
   I am actually wondering if we can do further. I think we could store 
something like  HashSet<(T::Native, usize)> (unique value + group id) instead 
of `Vec<HashSet<T::Native>>` (hashset per group) and delaying counting the 
values until the end by iterating all the values (instead of `.len()`).
   
   "Obvious" advantage is that we avoid creating *many* hashsets for high 
cardinality cases which makes performance and memory usage bad.
   
   However it seems kind of tricky of how to integrate it in the current 
groupsaccumulator setup  🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to