On Sat, Aug 10, 2013 at 8:59 AM, Ajo Fod <ajo....@gmail.com> wrote: > If the data doesn't fit, you probably need a StorelessQuantile estimator > like QuantileBin1D from the colt libraries. Then pick a resolution and do > the single pass search. >
Peripheral to the actual topic, but the Colt libraries are out of date in almost every respect. When we added unit tests, even the most basic functions turned up dozens of serious bugs. With respect to more advanced estimation such as quantiles, nothing in Colt comes close to streamlib. Even the Mahout on-line estimators are generally superior. QuantileBin1D, in particular, lacks the machinery of QDigests (not suprising since they were published in 2004, long after Colt went dormant). Check out https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java and the original paper http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf