Dandandan commented on PR #21561: URL: https://github.com/apache/datafusion/pull/21561#issuecomment-4251549675
> @Dandandan , I am trying to a Vector of pairs (group ID , value) approach to see if SIMD (sort and return group counts only during `evaluate` and `state` would cost lesser than computing hashes in `update_batch` which is more frequent) . Results were promising on my local but I would be happy if you could run a benchmarks on GH runners for more accuracy. I also moved distinct accumulators to a separate cold path (which were probably causing other count queries' minor slowness) > > Could you please trigger CI benchmarks to see if this is adding value over hashtable approach ? It seems to me the problem would be memory use (whenever it is non-unique)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
