On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río <
[email protected]> wrote:

> On Sun, Apr 12, 2015 at 12:19 AM, Varun <[email protected]> wrote:
>
>>
>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
>> tistics/A
>> <http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A>
>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb
>>
>> Long story short, histogram visualisations that depend on numpy (such as
>> matplotlib, or  nearly all of them) have poor default behaviour as I have
>> to
>> constantly play around with  the number of bins to get a good idea of
>> what I'm
>> looking at. The bins=10 works ok for  up to 1000 points or very normal
>> data,
>> but has poor performance for anything else, and  doesn't account for
>> variability either. I don't have a method easily available to scale the
>> number
>> of bins given the data.
>>
>> R doesn't suffer from these problems and provides methods for use with
>> it's
>> hist  method. I would like to provide similar functionality for
>> matplotlib, to
>> at least provide  some kind of good starting point, as histograms are very
>> useful for initial data discovery.
>>
>> The notebook above provides an explanation of the problem as well as some
>> proposed  alternatives. Use different datasets (type and size) to see the
>> performance of the  suggestions. All of the methods proposed exist in R
>> and
>> literature.
>>
>> I've put together an implementation to add this new functionality, but am
>> hesitant to  make a pull request as I would like some feedback from a
>> maintainer before doing so.
>>
>
> +1 on the PR.
>

+1 as well.

Unfortunately we can't change the default of 10, but a number of string
methods, with a "bins=auto" or some such name prominently recommended in
the docstring, would be very good to have.

Ralf
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to