Re: [Numpy-discussion] Possible bug in indexed masked arrays

Pierre GM Mon, 05 Apr 2010 00:08:20 -0700

On Apr 2, 2010, at 1:08 AM, Nathaniel Peterson wrote:
> 
> Is this behavior of masked arrays intended, or is it a bug?


It's not a bug, it's an unfortunate side effect of using boolean masked arrays 
for indices. Don't. Instead, you should fill the masked arrays with either True 
or False (depending on what you want).

Now, for some explanations:

> import numpy as np
> a=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
> b=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))

When using ma.fix_invalid, the nans and infs are masked and the corresponding 
set to a default (1e+20 for floats). Thus, you have:
>>> print a.data
[  1.00000000e+20  -1.00000000e+00   0.00000000e+00   1.00000000e+00]

> idx=(a==b)

Now, you compare two masked arrays. In practice, the arrays are first filled 
with 0, compared, and the mask is created afterwards. In the current case, we 
get a new masked array, whose first entry is masked (because a[0] is masked), 
and because the two underlying ndarrays are identical, the underlying ndarray 
of the result is [True  True  True  True].

> print(a[idx][3])
> # 1.0


The fun starts now: you are using idx, a masked array, as indices. Because the 
fancy indexing mechanism of numpy doesn't know how to process masked arrays, 
their underlying ndarray are used instead. Consider a[idx] equivalent to 
a[np.array(idx)]. Because np.array(idx) == idx.data == [True  True  True  
True], a[idx] returns a, hence the (4,) shape.

> But if I change the first element of b from np.nan to 2.0 then
> a[idx2] has shape (3,) despite np.alltrue(idx==idx2) being True:
> 
> c=np.ma.fix_invalid(np.array([2.0,-1,0,1]))
> idx2=(a==c)

So, c is a masked array without any masked values. When comparing a and c, the 
arrays are once again filled with 0 before the comparison. The ndarray  
underlying idx2 is therefore [False True True True], and the first item is 
masked (still because a[0] is masked). If you use idx2 for indexing, it's 
transformed to a ndarray, and you end up with the last three items of a (hence 
the (3.) shape).

> assert(np.alltrue(idx==idx2))

Now, you compare the two masked arrays idx and idx2. Remember the filling with 
0 that happens below the hood, so you end up comparing [False True True True] 
and [False True True True] with np.alltrue, which of course returns True...

Morale of the story: don't use masked arrays in fancy indexing, as you may not 
get what you expect.
I hope it clarified the situation a bit, but don't hesitate to ask more 
questions.
Cheers
P.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible bug in indexed masked arrays

Reply via email to