Hi, On Thu, Jun 30, 2011 at 5:03 PM, Pierre GM <pgmdevl...@gmail.com> wrote: > > On Jun 30, 2011, at 5:38 PM, Matthew Brett wrote: > >> Hi, >> >> On Thu, Jun 30, 2011 at 2:58 PM, Pierre GM <pgmdevl...@gmail.com> wrote: >>> >>> On Jun 30, 2011, at 3:31 PM, Matthew Brett wrote: >>>> ############################################### >>>> A alternative-NEP on masking and missing values >>>> ############################################### >>> >>> I like the idea of two different special values, np.NA for missing values, >>> np.IGNORE for masked values. np.NA values in an array define what was >>> implemented in numpy.ma as a 'hard mask' (where you can't unmask data), >>> while np.IGNOREs correspond to the .mask in numpy.ma. Looks fairly non >>> ambiguous that way. >>> >>> >>>> ************** >>>> Initialization >>>> ************** >>>> >>>> First, missing values can be set and be displayed as ``np.NA, NA``:: >>>> >>>>>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]') >>>> array([1., 2., NA, 7.], dtype='NA[<f8]') >>>> >>>> As the initialization is not ambiguous, this can be written without the NA >>>> dtype:: >>>> >>>>>>> np.array([1.0, 2.0, np.NA, 7.0]) >>>> array([1., 2., NA, 7.], dtype='NA[<f8]') >>>> >>>> Masked values can be set and be displayed as ``np.MASKED, MASKED``:: >>>> >>>>>>> np.array([1.0, 2.0, np.MASKED, 7.0], masked=True) >>>> array([1., 2., MASKED, 7.], masked=True) >>>> >>>> As the initialization is not ambiguous, this can be written without >>>> ``masked=True``:: >>>> >>>>>>> np.array([1.0, 2.0, np.MASKED, 7.0]) >>>> array([1., 2., MASKED, 7.], masked=True) >>> >>> I'm not happy with this 'masked' parameter, at all. What's the point? >>> Either you have np.NAs and/or np.IGNOREs or you don't. I'm probably missing >>> something here. >> >> If I put np.MASKED (I agree I prefer np.IGNORE) in the init, then >> obviously I mean it should be masked, so the 'masked=True' here is >> completely redundant, yes, I agree. And in fact: >> >> np.array([1.0, 2.0, np.MASKED, 7.0], masked=False) >> >> should raise an error. On the other hand, if I make a normal array: >> >> arr = np.array([1.0, 2.0, 7.0]) >> >> and then do this: >> >> arr.visible[2] = False >> >> then either I should raise an error (it's not a masked array), or, >> more magically, construct a mask on the fly. This somewhat breaks >> expectations though, because you might just have made a largish mask >> array without having any clue that that had happened. > > Well, I'd expect an error to be raised when assigning a NA if the initial > array is not NA friendly. The 'magical' creation of a mask would be nice, but > is probably too magic and best left alone.
I agree :) >>> >>>> >>>> Direct assignnent in the masked case is magic and confusing, and so >>>> happens only >>>> via the mask:: >>>> >>>>>>> masked_array = np.array([1.0, 2.0, 7.0], masked=True) >>>>>>> masked_arr[2] = np.NA >>>> TypeError('dtype does not support NA') >>>>>>> masked_arr[2] = np.MASKED >>>> TypeError('float() argument must be a string or a number') >>>>>>> masked_arr.visible[2] = False >>>>>>> masked_arr >>>> array([1., 2., MASKED], masked=True) >>> >>> What about the reverse case ? When you assign a regular value to a >>> np.NA/np.IGNORE item ? >> >> Well, for the np.NA case, this is straightforward: >> >> na_arr[2] = 3 >> >> It's just assignment. For ``masked_array[2] = 3`` - I don't know, I >> guess whatever we are used to. What do you think? > > Ahah, that depends. > With a = np.array([1., np.NA, 3.]), then a[1]=2. should raise an error, as > Mark suggests: you can't "unmask" a missing value, you need to create a view > of the initial array then "unmask". It's the equivalent of a hard mask. In this alterNEP, the NAs and the masked values are completely different. So, if you do this: a = np.array([1., np.NA, 3.]) then you've unambiguously asked for an array that can handle floats and NAs, and that will be the NA[<f8] dtype by default. You didn't ask for a masked array, you asked for an array that can carry NAs. You can't unmask an NA, because an NA isn't a masked value, it's an NA. So, if you do: a[1] = 2 you just mean 'change the NA in position [1] to the value 2'. Simple as that. > With a = np.array([1., np.IGNORE, 3.]), then a[1]=2. should give > np.array([1.,2.,3.]) and a.mask=[False,False,False]. That's a soft mask. Sounds reasonable to me... Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion