On Mon, Aug 30, 2010 at 2:43 PM, Benjamin Root <ben.r...@ou.edu> wrote: > On Mon, Aug 30, 2010 at 10:50 AM, <josef.p...@gmail.com> wrote: >> >> On Mon, Aug 30, 2010 at 11:39 AM, Bruce Southey <bsout...@gmail.com> >> wrote: >> > On 08/30/2010 09:19 AM, Benjamin Root wrote: >> > >> > On Mon, Aug 30, 2010 at 8:29 AM, David Huard <david.hu...@gmail.com> >> > wrote: >> >> >> >> Thanks for the feedback, >> >> As far as I understand it, the proposition is to keep histogram as it >> >> is >> >> for 1.5, then in 2.0, deprecate normed=True but keep the buggy >> >> behavior, >> >> while adding a density keyword that fixes the bug. In a later release, >> >> we >> >> could then get rid of normed. While the bug won't be present in >> >> histogramdd >> >> and histogram2d, the keyword change should be mirrored in those >> >> functions as >> >> well. >> >> I personally am not too keen on changing the keyword normed for >> >> density. I >> >> feel we are trading clarity for a few new users against additional >> >> trouble >> >> for many existing users. We could mitigate this by first documenting >> >> the >> >> change in the docstring and live with both keywords for a few years >> >> before >> >> raising a DeprecationWarning. >> >> Since this has a direct impact on matloblib's hist, I'd be keen to >> >> hears >> >> the devs on this. >> >> David >> > >> > I am not a dev, but I would like to give a word of warning from >> > matplotlib. >> > >> > In matplotlib, the bar/hist family of functions grew organically as the >> > devs >> > took on various requests to add keywords and such to modify the style >> > and >> > behavior of those graphing functions. It has now become an >> > unmaintainable >> > mess, prompting discussions on how to rip it out and replace it with a >> > cleaner implementation. While everyone agrees that it needs to be done, >> > we >> > all don't want to break backwards compatibility. >> > >> > My personal feeling is that a function should do one thing, and do that >> > one >> > thing well. So, to me, that means that histogram() should return an >> > array >> > of counts and the bins for those counts. Anything more is merely window >> > dressing to me. With this information, one can easily compute a >> > cumulative >> > distribution function, and/or normalize the result. The idea is that if >> > there is nothing special that needs to be done within the histogram >> > algorithm to accommodate these extra features, then they belong outside >> > the >> > function. >> > >> > My 2 cents, >> > Ben Root >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > +1 for Ben's approach. >> > This is very similar to my view regarding to the contingency table class >> > proposed for scipy ( http://projects.scipy.org/scipy/ticket/1258). We >> > need >> > to provide the core functionality that other approaches such as density >> > estimation can use but not be limited to specific details. >> >> I think (a corrected) density histogram is core functionality for >> unequal bin lengths. >> >> The graph with raw count in the case of unequal bin sizes would be >> quite misleading when plotted and interpreted on the real line and not >> on discrete points (shaded areas instead of vertical lines). And as >> the origin of this thread showed, it's not trivial to figure out what >> the correct normalization is. >> So, I think, if we drop the density normalization, we just need a new >> function that does it. >> >> My 2c, >> >> Josef >> >> > > Why not a function that takes the output of a core histogram and produces a > correct density normalization? Such a function would be useful elsewhere, I > imagine. > > Of course there is a lot of legacy issues to consider, but if we introduce > such a function first with documentation in histogram() showing how to > produce a normalized density, we can then keep some of the bad code for now > for backwards compatibility with notes saying that some of the stuff will be > deprecated. Especially point out in the docs where the current code fails > to produce the correct results.
bugfix or redesign ? My feature request for (or target for forking) the histogram functions is to get the temporary results out, or get additional results, for example the bin-number or quantization for each observation, or some other things that I don't remember right now. With histogram functions that only do histograms, we loose a lot of calculations. This is, however, not really relevant for calculating densities since the bin edges are returned. Josef > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion