Re: [Numpy-discussion] An NA compromise idea -- many-NA

Charles R Harris Fri, 01 Jul 2011 13:37:05 -0700

On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe <[email protected]> wrote:


> On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris <
> [email protected]> wrote:
>
>>
>>
>> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe <[email protected]> wrote:
>>
>>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote:
>>>
>>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]>wrote:
>>>>
>>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn
>>>>> <[email protected]> wrote:
>>>>> > I propose a simple idea *for the long term* for generalizing Mark's
>>>>> > proposal, that I hope may perhaps put some people behind Mark's
>>>>> concrete
>>>>> > proposal in the short term.
>>>>> >
>>>>> > If key feature missing in Mark's proposal is the ability to
>>>>> distinguish
>>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one
>>>>> could
>>>>> > conceive wanting to track a whole host of reasons:
>>>>> >
>>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2,
>>>>> TOO_LAZY])
>>>>> >
>>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have
>>>>> users
>>>>> > to still keep a seperate "shadow-array" for stuff like this?
>>>>> >
>>>>> > a) In this case the generality of Mark's proposal seems justified and
>>>>> > less confusing to teach newcomers (?)
>>>>> >
>>>>> > b) Since Mark's proposal seems to generalize well to many NAs
>>>>> (there's 8
>>>>> > bits in the mask, and millions of available NaN-s in floating point),
>>>>> if
>>>>> > people agreed to this one could leave it for later and just go on
>>>>> with
>>>>> > the proposed idea.
>>>>> >
>>>>>
>>>>> I have not been following the discussion in much detail, so forgive me
>>>>> if this has come up. But I think this approach is also similar to
>>>>> thinking behind missing values in SAS and "extended" missing values in
>>>>> Stata. They are missing but preserve an order. This way you can pull
>>>>> out values that are missing because they were eaten by a dog and see
>>>>> if these missing ones are systematically different than the ones that
>>>>> are missing because they're too lazy. Use case that pops to mind,
>>>>> seeing if the various ways of attrition in surveys or experiments
>>>>> varies in a non-random way.
>>>>>
>>>>>
>>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm
>>>>> http://www.stata.com/help.cgi?missing
>>>>
>>>>
>>>> That's interesting, and I see that they use a numerical ordering for the
>>>> different NA values. I think if instead of using the AND operator to 
>>>> combine
>>>> masks, we use MINIMUM, this behavior would happen naturally with almost no
>>>> additional work. Then, in addition to np.NA and np.NA(dtype), it could 
>>>> allow
>>>> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default.
>>>>
>>>
>>> Sorry, my brain is a bit addled by all these comments. This idea would
>>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as
>>> Christopher pointed out in a different thread.
>>>
>>
>> Or you could subtract instead of add and use maximum instead of minimum. I
>> thought those details would be hidden.
>>
>
> Definitely, but the most natural distinction thinking numerically is
> between zero and non-zero, and there's only one zero, so giving it the
> 'unmasked' value is natural for this way of extending it. If you follow
> Joe's idea where you're basically introducing it as an image alpha mask, you
> would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked.
>
>
I'm not complaining ;) I thought these ideas were out there from the
beginning, but maybe that was just me...

Chuck

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] An NA compromise idea -- many-NA

Reply via email to