Re: [PR] perf: Implement groups accumulator count distinct primitive types [datafusion]

via GitHub Wed, 15 Apr 2026 04:20:12 -0700


Dandandan commented on PR #21561:
URL: https://github.com/apache/datafusion/pull/21561#issuecomment-4251549675


   > @Dandandan , I am trying to a Vector of pairs (group ID , value) approach 
to see if SIMD (sort and return group counts only during `evaluate` and `state` 
would cost lesser than computing hashes in `update_batch` which is more 
frequent) . Results were promising on my local but I would be happy if you 
could run a benchmarks on GH runners for more accuracy. I also moved distinct 
accumulators to a separate cold path (which were probably causing other count 
queries' minor slowness)
   > 
   > Could you please trigger CI benchmarks to see if this is adding value over 
hashtable approach ?
   
   It seems to me the problem would be memory use (whenever it is non-unique)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Implement groups accumulator count distinct primitive types [datafusion]

Reply via email to