Yes, I understand there are alternatives -- but I still think a simple,
binned histogram is a fairly basic feature.
KDEs are nice but can easily be overtweaked (if I see one I certainly want
to know how the bandwidth was selected, otherwise it's not better than a
histogram -- even worse, as the issue is now hidden); while CDFs
(essentially, your second proposition) can be useful, some kinds of data
are traditionally represented as histograms and CDFs would only confuse
readers.
Antony
2014-05-30 15:11 GMT-07:00 Mark Voorhies <mark.voorh...@ucsf.edu>:
> On 05/30/2014 08:25 AM, Antony Lee wrote:
>
>> I can still need to bin data, e.g. when the data range is "large", or at
>> least not small compared to the number of data points.
>> Antony
>>
>
> Two alternatives to histograms that you might consider:
>
> Kernel density estimation (KDE)
>
> * This blog post has a good discussion motivating KDE from issues with bin
> choice in histograms:
> http://www.mglerner.com/blog/?p=28
> * And this follow up explores the various KDE implementations in the
> "Scientific Python" stack:
> http://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/
>
> A rank vs. value plot, e.g.:
>
> plot(sorted(r))
>
> This is horizontal for peaks (lots of copies of similar values) and
> vertical for tails/gaps,
> so it presents the same information as a histogram, but without requiring
> bin choice.
>
> --Mark
>
>
>
>>
>> 2014-05-30 5:03 GMT-07:00 Yoshi Rokuko <yo...@rokuko.net>:
>>
>> Am Thu, 29 May 2014 14:14:52 -0700
>>> schrieb Antony Lee <antony....@berkeley.edu>:
>>>
>>> Hi,
>>>> When histogramming integer data, is there an easy way to tell
>>>> matplotlib that I want a certain number of bins, and each bin to
>>>> cover an equal number of integers (except possibly the last one)?
>>>> (in order to avoid having some bins higher than others merely because
>>>> they cover more integers) I know I can pass in an explicit bins array
>>>> (something like list(range(min, max, (max-min)//n)) + max) but I was
>>>> hoping for something simpler, like hist(data, nbins=42,
>>>> equal_integer_coverage=True). Best,
>>>> Antony
>>>>
>>>
>>> Int data is discrete. For discrete variables you don't need bins, you
>>> don't estimate the frequency distribution you know it exactly by
>>> counting.
>>>
>>> Of course you could do that with the hist function:
>>>
>>> pl.hist(r, np.arange(min(r)-0.5, max(r)+1.5), histtype='step')
>>>>>>
>>>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Time is money. Stop wasting it! Get your web API in 5 minutes.
>>> www.restlet.com/download
>>> http://p.sf.net/sfu/restlet
>>> _______________________________________________
>>> Matplotlib-users mailing list
>>> Matplotlib-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Time is money. Stop wasting it! Get your web API in 5 minutes.
>> www.restlet.com/download
>> http://p.sf.net/sfu/restlet
>>
>>
>>
>> _______________________________________________
>> Matplotlib-users mailing list
>> Matplotlib-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>
>>
>
>
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users