On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn <[email protected]> wrote: > I propose a simple idea *for the long term* for generalizing Mark's > proposal, that I hope may perhaps put some people behind Mark's concrete > proposal in the short term. > > If key feature missing in Mark's proposal is the ability to distinguish > between different reason for NA-ness; IGNORE vs. NA. However, one could > conceive wanting to track a whole host of reasons: > > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, TOO_LAZY]) > > Wouldn't it be a shame to put a lot of work into NA, but then have users > to still keep a seperate "shadow-array" for stuff like this? > > a) In this case the generality of Mark's proposal seems justified and > less confusing to teach newcomers (?) > > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 > bits in the mask, and millions of available NaN-s in floating point), if > people agreed to this one could leave it for later and just go on with > the proposed idea. >
I have not been following the discussion in much detail, so forgive me if this has come up. But I think this approach is also similar to thinking behind missing values in SAS and "extended" missing values in Stata. They are missing but preserve an order. This way you can pull out values that are missing because they were eaten by a dog and see if these missing ones are systematically different than the ones that are missing because they're too lazy. Use case that pops to mind, seeing if the various ways of attrition in surveys or experiments varies in a non-random way. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm http://www.stata.com/help.cgi?missing Maybe this is neither here nor there, I just don't want to end up with the R way is the only way. That's why I prefer Python :) Skipper _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
