liukun4515 commented on issue #1823:
URL: 
https://github.com/apache/arrow-datafusion/issues/1823#issuecomment-1038800123


   > i have implement a initial version get below result: 
1million_rows_10thousand_distinct.parquet
   > 
   > ```
   > 1. count distint
   > +----------------------------+
   > | COUNT(DISTINCT test.value) |
   > +----------------------------+
   > | 10000                      |
   > +----------------------------+
   > 1 row in set. Query took 0.237 seconds.
   > 
   > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   > 2. bitmap distinct (roaring-rs)
   > 
   > +---------------------------------+
   > | BITMAPCOUNTDISTINCT(test.value) |
   > +---------------------------------+
   > | 10000                           |
   > +---------------------------------+
   > 1 row in set. Query took 0.052 seconds
   > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   > 3. approx distinct (hll)
   > 
   > +----------------------------+
   > | APPROXDISTINCT(test.value) |
   > +----------------------------+
   > | 9943                       |
   > +----------------------------+
   > 1 row in set. Query took 0.047 seconds.
   > ```
   > 
   > the bitmap used is [this](https://github.com/RoaringBitmap/roaring-rs) i 
have checked influx_iox use 
[croating-rs](https://github.com/saulius/croaring-rs) @alamb Sorry to bother 
you 😂, could you share some info why use croating-rs, if you have a bench 
result that would be fantastic 👍 !
   
   Could you please file the draft of the pull request?
   @Ted-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to