On Monday, 2 May 2022 at 05:22:07 UTC, test123 wrote:
https://forum.dlang.org/post/[email protected]
On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenko
wrote:
HyperLogLog++ is advanced cardinality estimation algorithm
with normal and compressed sparse representations. It can be
used to estimate approximate number of unique elements in an
unordered set.
hll-d [1, 2] is written in D. It can be used as betterC
library without linking with DRuntime. hll-d has C header and
C example.
Its implementation is based on Mir Algorithm [3]
1. mir.ndslice.topology.bitpack is used for arrays composed
of packed 6bit integers
2. mir.ndslice.sorting.sort is used for betterC sorting.
[1] Git: https://github.com/tamediadigital/hll-d
[2] Dub: http://code.dlang.org/packages/hll-d
[3] Mir Algorithm: https://github.com/libmir/mir-algorithm
Best regards,
Ilya
Thanks for the great work.
I check the c api, can not figure out how to get the count
number for one element.
For example if I use it as IP counter, is there a way to know
how much count for one IP has been add into set ?
No, that's not what this is for. Hyperloglog is useful if you
have a big dataset that may contain duplicates and you want to
know how many unique items you have (with a reasonnable
probability). For example, as a website, this can be used to
estimate how many visitors you have without having to store every
single IP address to check for duplicates at new connections. The
tradeoff is that it's probabilistic: you don't need to store
every address so you need much less space and time to get a count
of unique ips, but you have to accept a margin of error on that
result and you can't know what the IPs were in the first place,
just how many of them there are.