Hi all, This thread is to do with the github issues raised in #11879, #10297, #8203 of and possibly others that didn't appear in my search.
The main issue is that histogram(bins='auto') will sometimes raise a memory error if the number of automatically-generated bin edges is too large. In all documented cases, the conditions producing outsized bin numbers are when the auto-binning defaults to the 'fd' method. I have taken a crack at minimizing the number of bins used by setting bins to 'auto' in numpy's histogram method. Based on suggestions from eric-weiser, the approach merges empty bins. The method works for the sample datasets in all the issues related to the FD estimator (#11879, #10297, #8203). Note that this method produces unequal bin widths. You can see some code I've already written in a comment on issue #11879 in the link below. https://github.com/numpy/numpy/issues/11879#issuecomment-516686087 Thanks for reading this far. Would be happy to turn this into a PR if there is interest. -areeves87
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion