Ted - Any chance we can add your quantile estimator to stream-lib?
Matt On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[email protected]> wrote: > I also have a new quantile estimator that dominates all other > implementations that I know of on speed and accuracy (10us per point added, > 8K data size to get a few ppm accuracy for high or low quantiles and about > 0.05% accuracy on middle quantiles like the median). > > > > > On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[email protected]> wrote: > >> Summingbird uses algebird. I think Stripe might also have a library, Avi >> Bryant was toying with this for a while. >> >> Algebird has some nice features like not doing approximation at all for >> small sets (just use the real values), etc. we also recently did a bunch of >> work to make sure we can serialize all approximate structures so they can >> be correctly reused by different computations, sent across the wire, etc. >> >> I don't recall doing speed comparisons and the like, it would be >> interesting to see them if you guys are choosing what library to use. >> >> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[email protected]> wrote: >> >> > stream-lib is used quite widely and is generally high quality. >> > >> > The other competitive library is Brick House from Klout. >> > >> > >> http://engineering.klout.com/2013/01/introducing-brickhouse-major-open-source-release-from-klout/ >> > >> > >> > >> > >> > On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[email protected]> wrote: >> > >> >> Just saw this library today and thought it's something we can >> potentially >> >> leverage: >> >> >> >> https://github.com/addthis/stream-lib >> >> >> >> It has a number of algo for approximation streams and has code for >> >> cardinality estimation (HyperLogLog) and others. >> >> >> >> Looks like Twitter's SummingBird uses this library too. >> >> >> >> Tim >> >> >>
