On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]> wrote:
> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn > <[email protected]> wrote: > > I propose a simple idea *for the long term* for generalizing Mark's > > proposal, that I hope may perhaps put some people behind Mark's concrete > > proposal in the short term. > > > > If key feature missing in Mark's proposal is the ability to distinguish > > between different reason for NA-ness; IGNORE vs. NA. However, one could > > conceive wanting to track a whole host of reasons: > > > > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, > TOO_LAZY]) > > > > Wouldn't it be a shame to put a lot of work into NA, but then have users > > to still keep a seperate "shadow-array" for stuff like this? > > > > a) In this case the generality of Mark's proposal seems justified and > > less confusing to teach newcomers (?) > > > > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 > > bits in the mask, and millions of available NaN-s in floating point), if > > people agreed to this one could leave it for later and just go on with > > the proposed idea. > > > > I have not been following the discussion in much detail, so forgive me > if this has come up. But I think this approach is also similar to > thinking behind missing values in SAS and "extended" missing values in > Stata. They are missing but preserve an order. This way you can pull > out values that are missing because they were eaten by a dog and see > if these missing ones are systematically different than the ones that > are missing because they're too lazy. Use case that pops to mind, > seeing if the various ways of attrition in surveys or experiments > varies in a non-random way. > > > http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm > http://www.stata.com/help.cgi?missing That's interesting, and I see that they use a numerical ordering for the different NA values. I think if instead of using the AND operator to combine masks, we use MINIMUM, this behavior would happen naturally with almost no additional work. Then, in addition to np.NA and np.NA(dtype), it could allow np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. -Mark > > > Maybe this is neither here nor there, I just don't want to end up with > the R way is the only way. That's why I prefer Python :) > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
