Thanks all for the suggestions. Exactly what I was looking for. -Sandy
On Thu, Dec 5, 2013 at 5:00 AM, Sam Bessalah <samkil...@gmail.com> wrote: > Just as stated before Algebird has many data structure to compute those > like QTree, or Ted's tvdigest . Or you can look at stream-lib q digest > https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java > Or another one Frugal Streaming well described and with an implementation > on the AK blog > > http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/ > There are some example in the Spark streaming sample on how to integrate > algebird . > Sam Bessalah > > > On Dec 5, 2013, at 5:41 AM, Ryan Weald <r...@weald.com> wrote: > > > > Hi Sandy, > > You could take a look at using the Q-Tree data structure that is provided > > by Twitter's Algebird< > https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala > >. > > Due to the associative properties of Algebird's SemiGroup it is ideally > > suited for streaming computations. > > > > -Ryan > > > > > >> On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> > >> Hi All, > >> > >> We're working on a Spark application that could make use of a computing > >> quantiles in a streaming fashion. Something in the vein of what DataFu > has > >> for Pig > >> > >> > http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html > >> . > >> > >> Does anything like this exist in the Spark ecosystem? If not, would > there > >> be a good place to contribute this if we write it? > >> > >> thanks, > >> Sandy > >> >