On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote:
> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]>wrote: > >> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >> <[email protected]> wrote: >> > I propose a simple idea *for the long term* for generalizing Mark's >> > proposal, that I hope may perhaps put some people behind Mark's concrete >> > proposal in the short term. >> > >> > If key feature missing in Mark's proposal is the ability to distinguish >> > between different reason for NA-ness; IGNORE vs. NA. However, one could >> > conceive wanting to track a whole host of reasons: >> > >> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >> TOO_LAZY]) >> > >> > Wouldn't it be a shame to put a lot of work into NA, but then have users >> > to still keep a seperate "shadow-array" for stuff like this? >> > >> > a) In this case the generality of Mark's proposal seems justified and >> > less confusing to teach newcomers (?) >> > >> > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 >> > bits in the mask, and millions of available NaN-s in floating point), if >> > people agreed to this one could leave it for later and just go on with >> > the proposed idea. >> > >> >> I have not been following the discussion in much detail, so forgive me >> if this has come up. But I think this approach is also similar to >> thinking behind missing values in SAS and "extended" missing values in >> Stata. They are missing but preserve an order. This way you can pull >> out values that are missing because they were eaten by a dog and see >> if these missing ones are systematically different than the ones that >> are missing because they're too lazy. Use case that pops to mind, >> seeing if the various ways of attrition in surveys or experiments >> varies in a non-random way. >> >> >> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >> http://www.stata.com/help.cgi?missing > > > That's interesting, and I see that they use a numerical ordering for the > different NA values. I think if instead of using the AND operator to combine > masks, we use MINIMUM, this behavior would happen naturally with almost no > additional work. Then, in addition to np.NA and np.NA(dtype), it could allow > np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. > Sorry, my brain is a bit addled by all these comments. This idea would also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as Christopher pointed out in a different thread. -Mark > > -Mark > > >> >> >> Maybe this is neither here nor there, I just don't want to end up with >> the R way is the only way. That's why I prefer Python :) >> >> Skipper >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
