Just as stated before Algebird has many data structure to compute those like 
QTree, or Ted's tvdigest . Or you can look at stream-lib q digest 
https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java
 
Or another one Frugal Streaming well described and with an implementation on 
the AK blog
http://blog.aggregateknowledge.com/2013/09/16/sketch-of-the-day-frugal-streaming/
There are some example in the Spark streaming sample on how to integrate 
algebird .
Sam Bessalah

> On Dec 5, 2013, at 5:41 AM, Ryan Weald <r...@weald.com> wrote:
> 
> Hi Sandy,
> You could take a look at using the Q-Tree data structure that is provided
> by Twitter's 
> Algebird<https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/QTree.scala>.
> Due to the associative properties of Algebird's SemiGroup it is ideally
> suited for streaming computations.
> 
> -Ryan
> 
> 
>> On Wed, Dec 4, 2013 at 8:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:
>> 
>> Hi All,
>> 
>> We're working on a Spark application that could make use of a computing
>> quantiles in a streaming fashion.  Something in the vein of what DataFu has
>> for Pig
>> 
>> http://linkedin.github.io/datafu/docs/current/datafu/pig/stats/StreamingQuantile.html
>> .
>> 
>> Does anything like this exist in the Spark ecosystem?  If not, would there
>> be a good place to contribute this if we write it?
>> 
>> thanks,
>> Sandy
>> 

Reply via email to