That is a pretty good explanation. I find myself convinced by Matthew's arguments. I think that being able to separate ABSENT from IGNORED is a good idea. I also like being able to control SKIP and PROPAGATE (but I think the current implementation allows this already).
What is the counter-argument to this proposal? -Travis On Oct 27, 2011, at 7:31 PM, Matthew Brett wrote: > Hi, > > On Tue, Oct 25, 2011 at 7:56 PM, Travis Oliphant <[email protected]> > wrote: >> So, I am very interested in making sure I remember the details of the >> counterproposal. What I recall is that you wanted to be able to >> differentiate between a "bit-pattern" mask and a boolean-array mask in the >> API. I believe currently even when bit-pattern masks are implemented the >> difference will be "hidden" from the user on the Python level. >> >> I am sure to be missing other parts of the discussion as I have been in and >> out of it. > > The ideas > -------------- > > The question that we were addressing in the alter-NEP was: should > missing values implemented as bitpatterns appear to be the same as > missing values implemented with masks? We said no, and Mark said yes. > > To restate the argument in brief; Nathaniel and I and some others > thought that there were two separable ideas in play: > > 1) A value that is finally and completely missing. == ABSENT > 2) A value that we would like to ignore for the moment but might want > back at some future time == IGNORED > > (I'm using the adjectives ABSENT and IGNORED here to be short for the > objects 'absent value' and 'ignored value'. This is to distinguish > from the verbs below). > > We thought bitpatterns were a good match for the former, and masking > was a good match for the latter. > > We all agreed there were two things you might like to do with values > that were missing in both senses above: > > A) PROPAGATE; V + 1 == V > B) SKIP; K + 1 == 1 > > (Note verbs for the behaviors). > > I believe the original np.ma masked arrays always SKIP. > > In [2]: a = np.ma.masked_array? > In [3]: a = np.ma.masked_array([99, 2], mask=[True, False]) > In [4]: a > Out[4]: > masked_array(data = [-- 2], > mask = [ True False], > fill_value = 999999) > In [5]: a.sum() > Out[5]: 2 > > There was some discussion as to whether there was a reason to think > that ABSENT should always or by default PROPAGATE, and IGNORED should > always or by default SKIP. Chuck is referring to this idea when he > said further up this thread: > >> For instance, I'm thinking skipna=1 is the natural default for the masked >> arrays. > > The current implementation > --------------------------------------- > > What we have now is an implementation of masked arrays, but more > tightly integrated into the numpy core. In our language we have an > implementation of IGNORED that is tuned to be nearly indistinguishable > from the behavior we are expecting of ABSENT. > > Specifically, once you have done this: > > In [9]: a = np.array([99, 2], maskna=True) > > you can get something representing the mask: > > In [11]: np.isna(a) > Out[11]: array([False, False], dtype=bool) > > but I believe there is no way of setting the mask directly. In order > to set the mask, you have to do what looks like an assignment: > > In [12]: a[0] = np.NA > In [14]: a > Out[14]: array([NA, 2]) > > In fact, what has happened is the mask has changed, but the underlying > value has not: > > In [18]: orig = np.array([99, 2]) > > In [19]: a = orig.view(maskna=True) > > In [20]: a[0] = np.NA > > In [21]: a > Out[21]: array([NA, 2]) > > In [22]: orig > Out[22]: array([99, 2]) > > This is different from real assignment: > > In [23]: a[0] = 0 > > In [24]: a > Out[24]: array([0, 2], maskna=True) > > In [25]: orig > Out[25]: array([0, 2]) > > Some effort has gone into making it difficult to pull off the mask: > > In [30]: a.view(np.int64) > Out[30]: array([NA, 2]) > > In [31]: a.view(np.int64).flags > Out[31]: > C_CONTIGUOUS : True > F_CONTIGUOUS : True > OWNDATA : False > MASKNA : True > OWNMASKNA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [32]: a.astype(np.int64) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/mb312/<ipython-input-32-e7f3381c9692> in <module>() > ----> 1 a.astype(np.int64) > > ValueError: Cannot assign NA to an array which does not support NAs > > The default behavior of the masked values is PROPAGATE, but they can > be individually made to SKIP: > > In [28]: a.sum() # PROPAGATE > Out[28]: NA(dtype='int64') > > In [29]: a.sum(skipna=True) # SKIP > Out[29]: 2 > > Where's the beef? > ------------------------- > > I personally still think that it is confusing to fuse the concept of: > > 1) Masked arrays > 2) Arrays with bitpattern codes for missing > > and the concepts of > > A) ABSENT and > B) IGNORED > > Consequences for current code > -------------------------------------------- > > Specifically, it still seems to me to make sense to prefer this: > >>> a = np.array([99, 2[, masking=True) >>> a.mask > [ True, True ] >>> a.sum() > 101 >>> a.mask[0] = False >>> a.sum() > 2 > > It might make sense, as Chuck suggests, to change the default to > 'skipna=True', and I'd further suggest renaming np.NA to np.IGNORED > and 'skipna' to skipignored' for clarity. > > I still think the pseudo-assignment: > > In [20]: a[0] = np.NA > > is confusing, and should be removed. > > Later, should we ever have bitpatterns, there would be something like > np.ABSENT. This of course would make sense for assignment: > > In [20]: a[0] = np.ABSENT > > There would be another keyword argument 'skipabsent=False' such that, > when this is False, the ABSENT values propagate. > > Honestly, I think that NA should be a synonym for ABSENT, and so > should be removed until the dust has settled, and restored as (np.NA > == np.ABSENT) > > And I think, these two ideas, of masking / IGNORED and bitpattern / > ABSENT, would be much easier to explain. > > That's my best shot. > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. [email protected] 1-512-536-1057 http://www.enthought.com _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
