Just as stated before Algebird has many data structure to compute those like QTree, or Ted's tvdigest . Or you can look at stream-lib q digest https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java Or another one Frugal Streaming well described and with an implementation on the AK blog http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/ There are some example in the Spark streaming sample on how to integrate algebird . Sam Bessalah
> On Dec 5, 2013, at 5:41 AM, Ryan Weald <r...@weald.com> wrote: > > Hi Sandy, > You could take a look at using the Q-Tree data structure that is provided > by Twitter's > Algebird<https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala>. > Due to the associative properties of Algebird's SemiGroup it is ideally > suited for streaming computations. > > -Ryan > > >> On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: >> >> Hi All, >> >> We're working on a Spark application that could make use of a computing >> quantiles in a streaming fashion. Something in the vein of what DataFu has >> for Pig >> >> http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html >> . >> >> Does anything like this exist in the Spark ecosystem? If not, would there >> be a good place to contribute this if we write it? >> >> thanks, >> Sandy >>