liukun4515 commented on issue #1823: URL: https://github.com/apache/arrow-datafusion/issues/1823#issuecomment-1038800123
> i have implement a initial version get below result: 1million_rows_10thousand_distinct.parquet > > ``` > 1. count distint > +----------------------------+ > | COUNT(DISTINCT test.value) | > +----------------------------+ > | 10000 | > +----------------------------+ > 1 row in set. Query took 0.237 seconds. > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 2. bitmap distinct (roaring-rs) > > +---------------------------------+ > | BITMAPCOUNTDISTINCT(test.value) | > +---------------------------------+ > | 10000 | > +---------------------------------+ > 1 row in set. Query took 0.052 seconds > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 3. approx distinct (hll) > > +----------------------------+ > | APPROXDISTINCT(test.value) | > +----------------------------+ > | 9943 | > +----------------------------+ > 1 row in set. Query took 0.047 seconds. > ``` > > the bitmap used is [this](https://github.com/RoaringBitmap/roaring-rs) i have checked influx_iox use [croating-rs](https://github.com/saulius/croaring-rs) @alamb Sorry to bother you 😂, could you share some info why use croating-rs, if you have a bench result that would be fantastic 👍 ! Could you please file the draft of the pull request? @Ted-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org