Hey Guys, I saw this floating around Twitter recently:
https://github.com/tdunning/t-digest Seems like it might be a good way to compute quantiles from a Samza task. Just throwing it out there in case anyone's interested. One other thought would be to adapt this to a state store, so you could have predictable quantile computation (even in the face of failure). Keep in mind, though, that the algorithm is approximate, so you'd only get exactly the same approximate answer (hah!) in the case of failure. It does, however, take advantage of local disk. Cheers, Chris
