[Numpy-discussion] Re: Automatic binning for np.histogram

Ronald van Elburg Fri, 07 Mar 2025 07:56:54 -0800

I don't think there is an automatic  method for correct binning.

The methods mentioned in the pull request and related issue are all based on 
the assumption that the underlying distribution is Gaussian. There is 
absolutely no reason to assume that.


Reasonable expectations for automatic binning:
    - it will be wrong most of the time.

Reasonable number of bins for a sample of size  n:
    - max(10, sqrt(n)) to make sure there is a large number of filled bins, 
while still providing information about the data values for low numbers.
    
The documentation could point out that automatic binning should only be used 
for exploring a single data set as it is unsuited for comparing two different 
datasets. Also for later use in testing distribution similarity automatically 
binned data is not suited.
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Automatic binning for np.histogram

Reply via email to