That sounds like a reasonable extension - but I think there still exist cases where you want to treat the data as one uniform set when computing bins (toggling between orthogonal subsets of data) so isn't really a useful replacement.
I suppose this becomes relevant when `density` is passed to the individual histogram invocations. Does matplotlib handle that correctly for stacked histograms? On Thu, Mar 15, 2018, 20:14 Nathaniel Smith <[email protected]> wrote: > Instead of an nobs argument, maybe we should have a version that accepts > multiple data sets, so that we have the full information and can improve > the algorithm over time. > > On Mar 15, 2018 7:57 PM, "Thomas Caswell" <[email protected]> wrote: > >> Yes I like the name. >> >> The primary use-case for Matplotlib is that our `hist` method can take in >> a list of arrays and produces N histograms in one shot. Currently with >> 'auto' we only use the first data set to sort out what the bins should be >> and then re-use those for the rest of the data sets. This will let us get >> the bins on the merged input, but I take Josef's point that this is not >> actually what we want.... >> >> Tom >> >> On Mon, Mar 12, 2018 at 11:35 PM <[email protected]> wrote: >> >>> On Mon, Mar 12, 2018 at 11:20 PM, Eric Wieser >>> <[email protected]> wrote: >>> >> Given that the bin selection are data driven, transferring them >>> across datasets might not be so useful. >>> > >>> > The main application would be to compute bins across the union of all >>> > datasets. This is already possibly by using `np.histogram` and >>> > discarding the first result, but that's super wasteful. >>> >>> assuming "union" means a combined dataset. >>> >>> If you stack datasets, then the number of observations will not be >>> correct for individual datasets. >>> >>> In that case an additional keyword like nobs, or whatever name would >>> be appropriate for numpy, would be useful, e.g. use the average number >>> of observations across datasets. >>> Auxiliary statistic like std could then be computed on the total >>> dataset (if that makes sense, which would not be the case if the >>> variance across datasets is larger than the variance within datasets. >>> >>> Josef >>> >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > [email protected] >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ > NumPy-Discussion mailing list > [email protected] > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
