I don't think there is an automatic method for correct binning.
The methods mentioned in the pull request and related issue are all based on
the assumption that the underlying distribution is Gaussian. There is
absolutely no reason to assume that.
Reasonable expectations for automatic binning:
- it will be wrong most of the time.
Reasonable number of bins for a sample of size n:
- max(10, sqrt(n)) to make sure there is a large number of filled bins,
while still providing information about the data values for low numbers.
The documentation could point out that automatic binning should only be used
for exploring a single data set as it is unsuited for comparing two different
datasets. Also for later use in testing distribution similarity automatically
binned data is not suited.
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]