I don't think there is an automatic method for correct binning. The methods mentioned in the pull request and related issue are all based on the assumption that the underlying distribution is Gaussian. There is absolutely no reason to assume that.
Reasonable expectations for automatic binning: - it will be wrong most of the time. Reasonable number of bins for a sample of size n: - max(10, sqrt(n)) to make sure there is a large number of filled bins, while still providing information about the data values for low numbers. The documentation could point out that automatic binning should only be used for exploring a single data set as it is unsuited for comparing two different datasets. Also for later use in testing distribution similarity automatically binned data is not suited. _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com