coderfender commented on PR #21561: URL: https://github.com/apache/datafusion/pull/21561#issuecomment-4247315039
@Dandandan , I am trying to a Vector of pairs (group ID , value) approach to see if SIMD (sort and return group counts on ly during `evaluate` would cost lesser than computing hashes in `update_batch` method) . Results were promising on my local but I would be happy if you could run a benchmarks on GH runners for more accuracy. I also moved distinct accumulators to a separate cold path (which were probably causing other count queries' minor slowness) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
