On Thu, Sep 2, 2010 at 5:31 PM, <josef.p...@gmail.com> wrote: > On Thu, Sep 2, 2010 at 3:50 PM, Joe Kington <jking...@wisc.edu> wrote: > > Hi all, > > > > I just wanted to check if this would be considered a bug. > > > > numpy.histogram does not appear to preserve subclasses of ndarrays (e.g. > > masked arrays). This leads to considerable problems when working with > > masked arrays. (As per this Stack Overflow question) > > > > E.g. > > > > import numpy as np > > x = np.arange(100) > > x = np.ma.masked_where(x > 30, x) > > > > counts, bin_edges = np.histogram(x) > > > > yields: > > counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) > > bin_edges --> array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4, > > 69.3, 79.2, 89.1, 99. ]) > > > > I would have expected histogram to ignore the masked portion of the data. > > Is this a bug, or expected behavior? I'll open a bug report, if it's not > > expected behavior... > > If you want to ignore masked data it's just on extra function call > > histogram(m_arr.compressed()) > > I don't think the fact that this makes an extra copy will be relevant, > because I guess full masked array handling inside histogram will be a > lot more expensive. > > Using asanyarray would also allow matrices in and other subtypes that > might not be handled correctly by the histogram calculations. > > For anything else besides dropping masked observations, it would be > necessary to figure out what the masked array definition of a > histogram is, as Bruce pointed out. > > (Another interesting question would be if histogram handles nans > correctly, searchsorted ???) > > Josef >
Good points all around. I'll skip the enhancement request. Sorry for the noise! Thanks! -Joe > > > > > This would appear to be easily fixed by using asanyarray rather than > asarray > > within histogram. E.g. this diff for numpy/lib/function_base.py > > Index: function_base.py > > =================================================================== > > --- function_base.py (revision 8604) > > +++ function_base.py (working copy) > > @@ -132,9 +132,9 @@ > > > > """ > > > > - a = asarray(a) > > + a = asanyarray(a) > > if weights is not None: > > - weights = asarray(weights) > > + weights = asanyarray(weights) > > if np.any(weights.shape != a.shape): > > raise ValueError( > > 'weights should have the same shape as a.') > > @@ -156,7 +156,7 @@ > > mx += 0.5 > > bins = linspace(mn, mx, bins+1, endpoint=True) > > else: > > - bins = asarray(bins) > > + bins = asanyarray(bins) > > if (np.diff(bins) < 0).any(): > > raise AttributeError( > > 'bins must increase monotonically.') > > > > Thanks! > > -Joe > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion