Hi, We use it in SPM (performance monitoring). We've contributed a patch to lower the memory footprint for QDigest. The lib works well! :)
Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Sat, Nov 16, 2013 at 12:11 AM, Eugene Kirpichov <[email protected]> wrote: > Hi, > > Actually no, I have not used it in production, I wrote it for fun :) > But I know that it's used in TempoDB. You might want to contact them and > ask - > http://blog.tempo-db.com/post/42318820124/estimating-percentiles-on-streams-of-data > > On Thu Nov 14 2013 at 6:17:15 AM, Matt Abrams <[email protected]> wrote: > >> Great! I have not used Q digest in production yet but I believe >> Eugene, the author of stream-lib's Q digest implementation, has. >> Eugene, can you comment on how it performs in practice? >> >> Matt >> >> >> On Wed, Nov 13, 2013 at 4:45 PM, Ted Dunning <[email protected]> >> wrote: >> > >> > >> > As soon as it is for sure done. I have one more significant improvement >> to make so that it works on sequential values. I will hand the code to >> suneel who will be packaging it for mahout. You can def have it at the same >> time. >> > >> > I would love a review from you guys when I am ready. The theory doc is >> nearly to that point. Would you like I start there? Also, can I get some >> info from you about how q digests work in practice? >> > >> > Sent from my iPhone >> > >> > On Nov 13, 2013, at 20:46, Matt Abrams <[email protected]> wrote: >> > >> >> Ted - >> >> >> >> Any chance we can add your quantile estimator to stream-lib? >> >> >> >> Matt >> >> >> >> On Wed, Nov 13, 2013 at 5:38 AM, Ted Dunning <[email protected]> >> wrote: >> >>> I also have a new quantile estimator that dominates all other >> >>> implementations that I know of on speed and accuracy (10us per point >> added, >> >>> 8K data size to get a few ppm accuracy for high or low quantiles and >> about >> >>> 0.05% accuracy on middle quantiles like the median). >> >>> >> >>> >> >>> >> >>> >> >>> On Wed, Nov 13, 2013 at 8:53 AM, Dmitriy Ryaboy <[email protected]> >> wrote: >> >>> >> >>>> Summingbird uses algebird. I think Stripe might also have a library, >> Avi >> >>>> Bryant was toying with this for a while. >> >>>> >> >>>> Algebird has some nice features like not doing approximation at all >> for >> >>>> small sets (just use the real values), etc. we also recently did a >> bunch of >> >>>> work to make sure we can serialize all approximate structures so they >> can >> >>>> be correctly reused by different computations, sent across the wire, >> etc. >> >>>> >> >>>> I don't recall doing speed comparisons and the like, it would be >> >>>> interesting to see them if you guys are choosing what library to use. >> >>>> >> >>>> On Nov 13, 2013, at 12:33 AM, Ted Dunning <[email protected]> >> wrote: >> >>>> >> >>>>> stream-lib is used quite widely and is generally high quality. >> >>>>> >> >>>>> The other competitive library is Brick House from Klout. >> >>>> http://engineering.klout.com/2013/01/introducing- >> brickhouse-major-open-source-release-from-klout/ >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Tue, Nov 12, 2013 at 7:28 PM, Timothy Chen <[email protected]> >> wrote: >> >>>>> >> >>>>>> Just saw this library today and thought it's something we can >> >>>> potentially >> >>>>>> leverage: >> >>>>>> >> >>>>>> https://github.com/addthis/stream-lib >> >>>>>> >> >>>>>> It has a number of algo for approximation streams and has code for >> >>>>>> cardinality estimation (HyperLogLog) and others. >> >>>>>> >> >>>>>> Looks like Twitter's SummingBird uses this library too. >> >>>>>> >> >>>>>> Tim >> >>>> >>
