coderfender commented on PR #21453:
URL: https://github.com/apache/datafusion/pull/21453#issuecomment-4204666466

   It seems like my bitmap setup was suboptimal for `u8/i8` . Instead of using 
`[u8;4]` I tried not bothering with the dense packing which might cause cache 
misses and went with `[bool:256]` . This significantly sped up the operation 
and now we are at least 2x faster than HLL for smaller integer data types 👍🏽 
   
   ```
   group                                            branch                      
         main
   -----                                            -------                     
         ----
   approx_distinct i16 bitmap                     1.00      3.1±0.23µs        ? 
?/sec    1.94      5.9±0.08µs        ? ?/sec
   approx_distinct i64 80% distinct               1.01      5.8±0.11µs        ? 
?/sec    1.00      5.8±0.14µs        ? ?/sec
   approx_distinct i64 99% distinct               1.02      6.0±0.39µs        ? 
?/sec    1.00      5.8±0.17µs        ? ?/sec
   approx_distinct i8 bitmap                      1.00      2.1±0.17µs        ? 
?/sec    2.87      5.9±0.06µs        ? ?/sec
   approx_distinct u16 bitmap                     1.00      3.0±0.05µs        ? 
?/sec    1.95      5.8±0.23µs        ? ?/sec
   approx_distinct u8 bitmap                      1.00      2.2±0.18µs        ? 
?/sec    2.69      5.8±0.34µs        ? ?/sec
   approx_distinct utf8 long 80% distinct         1.00     16.3±0.57µs        ? 
?/sec    1.00     16.2±0.49µs        ? ?/sec
   approx_distinct utf8 long 99% distinct         1.00     16.3±0.39µs        ? 
?/sec    1.00     16.2±0.23µs        ? ?/sec
   approx_distinct utf8 short 80% distinct        1.01     11.1±0.47µs        ? 
?/sec    1.00     11.0±0.08µs        ? ?/sec
   approx_distinct utf8 short 99% distinct        1.00     11.1±0.51µs        ? 
?/sec    1.00     11.0±0.48µs        ? ?/sec
   approx_distinct utf8view long 80% distinct     1.00     19.0±2.63µs        ? 
?/sec    1.00     19.0±0.49µs        ? ?/sec
   approx_distinct utf8view long 99% distinct     1.00     19.0±2.09µs        ? 
?/sec    1.00     19.0±0.19µs        ? ?/sec
   approx_distinct utf8view short 80% distinct    1.00      6.1±0.22µs        ? 
?/sec    1.05      6.3±0.40µs        ? ?/sec
   approx_distinct utf8view short 99% distinct    1.00      6.1±0.34µs        ? 
?/sec    1.02      6.2±0.22µs        ? ?/sec
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to