On 09/02/2010 02:50 PM, Joe Kington wrote:
Hi all,
I just wanted to check if this would be considered a bug.
numpy.histogram does not appear to preserve subclasses of ndarrays
(e.g. masked arrays). This leads to considerable problems when
working with masked arrays. (As per this Stack Overflow question
<http://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy>)
E.g.
import numpy as np
x = np.arange(100)
x = np.ma.masked_where(x > 30, x)
counts, bin_edges = np.histogram(x)
yields:
counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
bin_edges --> array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4,
69.3, 79.2, 89.1, 99. ])
I would have expected histogram to ignore the masked portion of the
data. Is this a bug, or expected behavior? I'll open a bug report,
if it's not expected behavior...
This would appear to be easily fixed by using asanyarray rather than
asarray within histogram. E.g. this diff for numpy/lib/function_base.py
Index: function_base.py
===================================================================
--- function_base.py (revision 8604)
+++ function_base.py (working copy)
@@ -132,9 +132,9 @@
"""
- a = asarray(a)
+ a = asanyarray(a)
if weights is not None:
- weights = asarray(weights)
+ weights = asanyarray(weights)
if np.any(weights.shape != a.shape):
raise ValueError(
'weights should have the same shape as a.')
@@ -156,7 +156,7 @@
mx += 0.5
bins = linspace(mn, mx, bins+1, endpoint=True)
else:
- bins = asarray(bins)
+ bins = asanyarray(bins)
if (np.diff(bins) < 0).any():
raise AttributeError(
'bins must increase monotonically.')
Thanks!
-Joe
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
I would not call it a bug as this a known 'feature' of functions that
use np.asarray(). You are welcome to file a enhancement bug but there
are some issues that need to be addressed.
Typical questions that come to mind are:
1) Should a user be warned that the input is a masked array?
2) Should histogram count the number of masked values?
3) What is the expected output when normed=True?
4) What type of array should be the weights and bin arguments?
5) What is the dimensions of the weight and bin arguments since it only
needs to have the number of bins?
6) If the input array is masked should the weight and bins arguments
also be masked arrays when applicable? If so, what happens if the masks
are in different locations between arrays?
Regards
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion