On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe <[email protected]> wrote:
> On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris < > [email protected]> wrote: > >> >> >> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe <[email protected]> wrote: >> >>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote: >>> >>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]>wrote: >>>> >>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>>> <[email protected]> wrote: >>>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>>> > proposal, that I hope may perhaps put some people behind Mark's >>>>> concrete >>>>> > proposal in the short term. >>>>> > >>>>> > If key feature missing in Mark's proposal is the ability to >>>>> distinguish >>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>>> could >>>>> > conceive wanting to track a whole host of reasons: >>>>> > >>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>>> TOO_LAZY]) >>>>> > >>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>>> users >>>>> > to still keep a seperate "shadow-array" for stuff like this? >>>>> > >>>>> > a) In this case the generality of Mark's proposal seems justified and >>>>> > less confusing to teach newcomers (?) >>>>> > >>>>> > b) Since Mark's proposal seems to generalize well to many NAs >>>>> (there's 8 >>>>> > bits in the mask, and millions of available NaN-s in floating point), >>>>> if >>>>> > people agreed to this one could leave it for later and just go on >>>>> with >>>>> > the proposed idea. >>>>> > >>>>> >>>>> I have not been following the discussion in much detail, so forgive me >>>>> if this has come up. But I think this approach is also similar to >>>>> thinking behind missing values in SAS and "extended" missing values in >>>>> Stata. They are missing but preserve an order. This way you can pull >>>>> out values that are missing because they were eaten by a dog and see >>>>> if these missing ones are systematically different than the ones that >>>>> are missing because they're too lazy. Use case that pops to mind, >>>>> seeing if the various ways of attrition in surveys or experiments >>>>> varies in a non-random way. >>>>> >>>>> >>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>>> http://www.stata.com/help.cgi?missing >>>> >>>> >>>> That's interesting, and I see that they use a numerical ordering for the >>>> different NA values. I think if instead of using the AND operator to >>>> combine >>>> masks, we use MINIMUM, this behavior would happen naturally with almost no >>>> additional work. Then, in addition to np.NA and np.NA(dtype), it could >>>> allow >>>> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. >>>> >>> >>> Sorry, my brain is a bit addled by all these comments. This idea would >>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >>> Christopher pointed out in a different thread. >>> >> >> Or you could subtract instead of add and use maximum instead of minimum. I >> thought those details would be hidden. >> > > Definitely, but the most natural distinction thinking numerically is > between zero and non-zero, and there's only one zero, so giving it the > 'unmasked' value is natural for this way of extending it. If you follow > Joe's idea where you're basically introducing it as an image alpha mask, you > would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. > > I'm not complaining ;) I thought these ideas were out there from the beginning, but maybe that was just me... Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
