Re: D for BigData: the first BetterC library by Tamediadigital

Cym13 via Digitalmars-d-announce Sun, 01 May 2022 23:21:41 -0700

On Monday, 2 May 2022 at 05:22:07 UTC, test123 wrote:

https://forum.dlang.org/post/[email protected]
On Saturday, 25 February 2017 at 14:32:00 UTC, Ilya Yaroshenkowrote:
HyperLogLog++ is advanced cardinality estimation algorithmwith normal and compressed sparse representations. It can beused to estimate approximate number of unique elements in anunordered set.
hll-d [1, 2] is written in D. It can be used as betterClibrary without linking with DRuntime. hll-d has C header andC example.
Its implementation is based on Mir Algorithm [3]
1. mir.ndslice.topology.bitpack is used for arrays composedof packed 6bit integers
  2. mir.ndslice.sorting.sort is used for betterC sorting.

[1] Git: https://github.com/tamediadigital/hll-d
[2] Dub: http://code.dlang.org/packages/hll-d
[3] Mir Algorithm: https://github.com/libmir/mir-algorithm

Best regards,
Ilya
Thanks for the great work.
I check the c api, can not figure out how to get the countnumber for one element.
For example if I use it as IP counter, is there a way to knowhow much count for one IP has been add into set ?

No, that's not what this is for. Hyperloglog is useful if youhave a big dataset that may contain duplicates and you want toknow how many unique items you have (with a reasonnableprobability). For example, as a website, this can be used toestimate how many visitors you have without having to store everysingle IP address to check for duplicates at new connections. Thetradeoff is that it's probabilistic: you don't need to storeevery address so you need much less space and time to get a countof unique ips, but you have to accept a margin of error on thatresult and you can't know what the IPs were in the first place,just how many of them there are.

Re: D for BigData: the first BetterC library by Tamediadigital

Reply via email to