As soon as it is for sure done. I have one more significant improvement to make so that it works on sequential values. I will hand the code to suneel who will be packaging it for mahout. You can def have it at the same time.
I would love a review from you guys when I am ready. The theory doc is nearly to that point. Would you like I start there? Also, can I get some info from you about how q digests work in practice? Sent from my iPhone On Nov 13, 2013, at 20:46, Matt Abrams <[email protected]> wrote: > Ted - > > Any chance we can add your quantile estimator to stream-lib? > > Matt > > On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[email protected]> wrote: >> I also have a new quantile estimator that dominates all other >> implementations that I know of on speed and accuracy (10us per point added, >> 8K data size to get a few ppm accuracy for high or low quantiles and about >> 0.05% accuracy on middle quantiles like the median). >> >> >> >> >> On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[email protected]> wrote: >> >>> Summingbird uses algebird. I think Stripe might also have a library, Avi >>> Bryant was toying with this for a while. >>> >>> Algebird has some nice features like not doing approximation at all for >>> small sets (just use the real values), etc. we also recently did a bunch of >>> work to make sure we can serialize all approximate structures so they can >>> be correctly reused by different computations, sent across the wire, etc. >>> >>> I don't recall doing speed comparisons and the like, it would be >>> interesting to see them if you guys are choosing what library to use. >>> >>> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[email protected]> wrote: >>> >>>> stream-lib is used quite widely and is generally high quality. >>>> >>>> The other competitive library is Brick House from Klout. >>> http://engineering.klout.com/2013/01/introducing-brickhouse-major-open-source-release-from-klout/ >>>> >>>> >>>> >>>> >>>> On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[email protected]> wrote: >>>> >>>>> Just saw this library today and thought it's something we can >>> potentially >>>>> leverage: >>>>> >>>>> https://github.com/addthis/stream-lib >>>>> >>>>> It has a number of algo for approximation streams and has code for >>>>> cardinality estimation (HyperLogLog) and others. >>>>> >>>>> Looks like Twitter's SummingBird uses this library too. >>>>> >>>>> Tim >>>
