Also q-tree is implemented in algebird, not hard to get it going in spark. That is another probabilistic data structure that is useful for this.
On Apr 17, 2017 11:27, "Jason White" <jason.wh...@shopify.com> wrote: > Have you looked at t-digests? > > Calculating percentiles (including medians) is something that is inherently > difficult/inefficient to do in a distributed system. T-digests provide a > useful probabilistic structure to allow you to compute any percentile with > a > known (and tunable) margin of error. > > https://github.com/tdunning/t-digest > > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/distributed-computation-of-median- > tp21356p21357.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >