This blog post, and the links within also seem relevant. Appears to have python code available to try things out as well.
https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest -Eric On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root <[email protected]> wrote: > "Then you can set about convincing matplotlib and friends to > use it by default" > > Just to note, this proposal was originally made over in the matplotlib > project. We sent it over here where its benefits would have wider reach. > Matplotlib's plan is not to change the defaults, but to offload as much as > possible to numpy so that it can support these new features if they are > available. We might need to do some input validation so that users running > older version of numpy can get a sensible error message. > > Cheers! > Ben Root > > > On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith <[email protected]> wrote: > >> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar <[email protected]> >> wrote: >> > Can I suggest that we instead add the P-square algorithm for the dynamic >> > calculation of histograms? >> > ( >> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >> ) >> > >> > This is already implemented in C++'s boost library >> > ( >> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >> ) >> > >> > I implemented it in Boost Python as a module, which I'm happy to share. >> > This is much better than fixed-width histograms in practice. Rather >> than >> > adjusting the number of bins, it adjusts what you really want, which is >> the >> > resolution of the bins throughout the domain. >> >> This definitely sounds like a useful thing to have in numpy or scipy >> (though if it's possible to do without using Boost/C++ that would be >> nice). But yeah, we should leave the existing histogram alone (in this >> regard) and add a new name for this like "adaptive_histogram" or >> something. Then you can set about convincing matplotlib and friends to >> use it by default :-) >> >> -n >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
