Just saw this library today and thought it's something we can potentially leverage:
https://github.com/addthis/stream-lib It has a number of algo for approximation streams and has code for cardinality estimation (HyperLogLog) and others. Looks like Twitter's SummingBird uses this library too. Tim
