Hi everyone,

For certain input arrays with small variation the method np.histogram with 
automatic binning can  select an enormous amount of bins and return with an 
out-of-memory error. A minimal example:

import numpy as np
e = 1 + 1e-12
Z = [0,1,1,1,1,1,e,e,e,e,e,e, 2]
np.histogram(Z, bins="auto")

There is a proposal to change the automatic bin selection to avoid this: 
https://github.com/numpy/numpy/pull/28426. The aim is to keep close to the 
original algorithm, but avoid the out-of-memory issues for input with small 
variance. It passes unit tests, but since this is a user visible change we 
would like some more input.
In particular:

* What are expectations of the auto binning algorithm?
* What is a reasonable maximum number of bins for a sample of size  n?

With kind regards,
Pieter Eendebak
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to