On Fri, Jul 1, 2011 at 2:42 PM, Mark Wiebe <[email protected]> wrote:
> On Fri, Jul 1, 2011 at 3:36 PM, Charles R Harris < > [email protected]> wrote: > >> >> >> On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe <[email protected]> wrote: >> >>> On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris < >>> [email protected]> wrote: >>> >>>> >>>> >>>> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe <[email protected]> wrote: >>>> >>>>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote: >>>>> >>>>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>>>>> <[email protected]> wrote: >>>>>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>>>>> > proposal, that I hope may perhaps put some people behind Mark's >>>>>>> concrete >>>>>>> > proposal in the short term. >>>>>>> > >>>>>>> > If key feature missing in Mark's proposal is the ability to >>>>>>> distinguish >>>>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>>>>> could >>>>>>> > conceive wanting to track a whole host of reasons: >>>>>>> > >>>>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>>>>> TOO_LAZY]) >>>>>>> > >>>>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>>>>> users >>>>>>> > to still keep a seperate "shadow-array" for stuff like this? >>>>>>> > >>>>>>> > a) In this case the generality of Mark's proposal seems justified >>>>>>> and >>>>>>> > less confusing to teach newcomers (?) >>>>>>> > >>>>>>> > b) Since Mark's proposal seems to generalize well to many NAs >>>>>>> (there's 8 >>>>>>> > bits in the mask, and millions of available NaN-s in floating >>>>>>> point), if >>>>>>> > people agreed to this one could leave it for later and just go on >>>>>>> with >>>>>>> > the proposed idea. >>>>>>> > >>>>>>> >>>>>>> I have not been following the discussion in much detail, so forgive >>>>>>> me >>>>>>> if this has come up. But I think this approach is also similar to >>>>>>> thinking behind missing values in SAS and "extended" missing values >>>>>>> in >>>>>>> Stata. They are missing but preserve an order. This way you can pull >>>>>>> out values that are missing because they were eaten by a dog and see >>>>>>> if these missing ones are systematically different than the ones that >>>>>>> are missing because they're too lazy. Use case that pops to mind, >>>>>>> seeing if the various ways of attrition in surveys or experiments >>>>>>> varies in a non-random way. >>>>>>> >>>>>>> >>>>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>>>>> http://www.stata.com/help.cgi?missing >>>>>> >>>>>> >>>>>> That's interesting, and I see that they use a numerical ordering for >>>>>> the different NA values. I think if instead of using the AND operator to >>>>>> combine masks, we use MINIMUM, this behavior would happen naturally with >>>>>> almost no additional work. Then, in addition to np.NA and np.NA(dtype), >>>>>> it >>>>>> could allow np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 >>>>>> is >>>>>> the default. >>>>>> >>>>> >>>>> Sorry, my brain is a bit addled by all these comments. This idea would >>>>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >>>>> Christopher pointed out in a different thread. >>>>> >>>> >>>> Or you could subtract instead of add and use maximum instead of minimum. >>>> I thought those details would be hidden. >>>> >>> >>> Definitely, but the most natural distinction thinking numerically is >>> between zero and non-zero, and there's only one zero, so giving it the >>> 'unmasked' value is natural for this way of extending it. If you follow >>> Joe's idea where you're basically introducing it as an image alpha mask, you >>> would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. >>> >>> >> I'm not complaining ;) I thought these ideas were out there from the >> beginning, but maybe that was just me... >> > > You're right, but it feels like it's been 10 years in internet time by now. > :) > > The design has evolved a lot from all the feedback too, so revisiting some > of these things that initially may have felt less like they fit before > doesn't hurt. I'm not so keen on rereading 250+ email messages though... > > I wouldn't worry about it too much. You chose masks as one of the fundamental options because of their generality and this is one of the consequences of that generality. I was also thinking about this in terms of Pierre's soft/hard mask distinction, I don't know about the shared mask thing. Several questions that have also been floating about in my mind are these. Can you mask an array with NA values? can you mask a masked array with a view? Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
