drusso opened a new pull request #8606:
URL: https://github.com/apache/arrow/pull/8606


   [ARROW-10510](https://issues.apache.org/jira/browse/ARROW-10510)
   
   This change adds benchmarks for `COUNT(DISTINCT)` queries. This is a small 
follow-up to [ARROW-10043](https://issues.apache.org/jira/browse/ARROW-10043) / 
#8222. In that PR, a number of implementation ideas were discussed for 
follow-ups, and having benchmarks will help evaluate them. 
   
   ---
   
   There are two benchmarks added:
   
   * wide: all of the values are distinct; this is looking at worst-case 
performance
   * narrow: only a handful of distinct values; this is closer to best-case 
performance
   
   The wide benchmark runs ~ 7x slower than the narrow benchmark. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to