On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe <[email protected]> wrote:
> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote: > >> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]>wrote: >> >>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>> <[email protected]> wrote: >>> > I propose a simple idea *for the long term* for generalizing Mark's >>> > proposal, that I hope may perhaps put some people behind Mark's >>> concrete >>> > proposal in the short term. >>> > >>> > If key feature missing in Mark's proposal is the ability to distinguish >>> > between different reason for NA-ness; IGNORE vs. NA. However, one could >>> > conceive wanting to track a whole host of reasons: >>> > >>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>> TOO_LAZY]) >>> > >>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>> users >>> > to still keep a seperate "shadow-array" for stuff like this? >>> > >>> > a) In this case the generality of Mark's proposal seems justified and >>> > less confusing to teach newcomers (?) >>> > >>> > b) Since Mark's proposal seems to generalize well to many NAs (there's >>> 8 >>> > bits in the mask, and millions of available NaN-s in floating point), >>> if >>> > people agreed to this one could leave it for later and just go on with >>> > the proposed idea. >>> > >>> >>> I have not been following the discussion in much detail, so forgive me >>> if this has come up. But I think this approach is also similar to >>> thinking behind missing values in SAS and "extended" missing values in >>> Stata. They are missing but preserve an order. This way you can pull >>> out values that are missing because they were eaten by a dog and see >>> if these missing ones are systematically different than the ones that >>> are missing because they're too lazy. Use case that pops to mind, >>> seeing if the various ways of attrition in surveys or experiments >>> varies in a non-random way. >>> >>> >>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>> http://www.stata.com/help.cgi?missing >> >> >> That's interesting, and I see that they use a numerical ordering for the >> different NA values. I think if instead of using the AND operator to combine >> masks, we use MINIMUM, this behavior would happen naturally with almost no >> additional work. Then, in addition to np.NA and np.NA(dtype), it could allow >> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. >> > > Sorry, my brain is a bit addled by all these comments. This idea would also > require flipping the mask so 0 is unmasked. and 1 to 255 is masked as > Christopher pointed out in a different thread. > Or you could subtract instead of add and use maximum instead of minimum. I thought those details would be hidden. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
