Bloom Filter, MinHash, and HyperLogLog are some of the commonly used algorithms in Big Data. I think having them in the Malhar library would be a good idea.
There's a ticket for HyperLogLog created long time ago: https://malhar.atlassian.net/browse/MLHR-1822 On Tue, Dec 8, 2015 at 5:42 PM, Chandni Singh <[email protected]> wrote: > Hi, > > We need to add a BloomFilter implementation in Malhar. ManagedState has a > use for it and I am pretty sure we will come up more and more use cases > that will need it. Tim's suggestion on Spill-able/Spooled data structures > may use it too. > > Chandni >
