coderfender commented on PR #21453:
URL: https://github.com/apache/datafusion/pull/21453#issuecomment-4204666466
It seems like my bitmap setup was suboptimal for `u8/i8` . Instead of using
`[u8;4]` I tried not bothering with the dense packing which might cause cache
misses and went with `[bool:256]` . This significantly sped up the operation
and now we are at least 2x faster than HLL for smaller integer data types 👍🏽
```
group branch
main
----- -------
----
approx_distinct i16 bitmap 1.00 3.1±0.23µs ?
?/sec 1.94 5.9±0.08µs ? ?/sec
approx_distinct i64 80% distinct 1.01 5.8±0.11µs ?
?/sec 1.00 5.8±0.14µs ? ?/sec
approx_distinct i64 99% distinct 1.02 6.0±0.39µs ?
?/sec 1.00 5.8±0.17µs ? ?/sec
approx_distinct i8 bitmap 1.00 2.1±0.17µs ?
?/sec 2.87 5.9±0.06µs ? ?/sec
approx_distinct u16 bitmap 1.00 3.0±0.05µs ?
?/sec 1.95 5.8±0.23µs ? ?/sec
approx_distinct u8 bitmap 1.00 2.2±0.18µs ?
?/sec 2.69 5.8±0.34µs ? ?/sec
approx_distinct utf8 long 80% distinct 1.00 16.3±0.57µs ?
?/sec 1.00 16.2±0.49µs ? ?/sec
approx_distinct utf8 long 99% distinct 1.00 16.3±0.39µs ?
?/sec 1.00 16.2±0.23µs ? ?/sec
approx_distinct utf8 short 80% distinct 1.01 11.1±0.47µs ?
?/sec 1.00 11.0±0.08µs ? ?/sec
approx_distinct utf8 short 99% distinct 1.00 11.1±0.51µs ?
?/sec 1.00 11.0±0.48µs ? ?/sec
approx_distinct utf8view long 80% distinct 1.00 19.0±2.63µs ?
?/sec 1.00 19.0±0.49µs ? ?/sec
approx_distinct utf8view long 99% distinct 1.00 19.0±2.09µs ?
?/sec 1.00 19.0±0.19µs ? ?/sec
approx_distinct utf8view short 80% distinct 1.00 6.1±0.22µs ?
?/sec 1.05 6.3±0.40µs ? ?/sec
approx_distinct utf8view short 99% distinct 1.00 6.1±0.34µs ?
?/sec 1.02 6.2±0.22µs ? ?/sec
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]