drusso opened a new pull request #8606: URL: https://github.com/apache/arrow/pull/8606
[ARROW-10510](https://issues.apache.org/jira/browse/ARROW-10510) This change adds benchmarks for `COUNT(DISTINCT)` queries. This is a small follow-up to [ARROW-10043](https://issues.apache.org/jira/browse/ARROW-10043) / #8222. In that PR, a number of implementation ideas were discussed for follow-ups, and having benchmarks will help evaluate them. --- There are two benchmarks added: * wide: all of the values are distinct; this is looking at worst-case performance * narrow: only a handful of distinct values; this is closer to best-case performance The wide benchmark runs ~ 7x slower than the narrow benchmark. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
