Re: [Numpy-discussion] An NA compromise idea -- many-NA

Charles R Harris Fri, 01 Jul 2011 13:29:47 -0700

On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe <[email protected]> wrote:


> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe <[email protected]> wrote:
>
>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold <[email protected]>wrote:
>>
>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn
>>> <[email protected]> wrote:
>>> > I propose a simple idea *for the long term* for generalizing Mark's
>>> > proposal, that I hope may perhaps put some people behind Mark's
>>> concrete
>>> > proposal in the short term.
>>> >
>>> > If key feature missing in Mark's proposal is the ability to distinguish
>>> > between different reason for NA-ness; IGNORE vs. NA. However, one could
>>> > conceive wanting to track a whole host of reasons:
>>> >
>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2,
>>> TOO_LAZY])
>>> >
>>> > Wouldn't it be a shame to put a lot of work into NA, but then have
>>> users
>>> > to still keep a seperate "shadow-array" for stuff like this?
>>> >
>>> > a) In this case the generality of Mark's proposal seems justified and
>>> > less confusing to teach newcomers (?)
>>> >
>>> > b) Since Mark's proposal seems to generalize well to many NAs (there's
>>> 8
>>> > bits in the mask, and millions of available NaN-s in floating point),
>>> if
>>> > people agreed to this one could leave it for later and just go on with
>>> > the proposed idea.
>>> >
>>>
>>> I have not been following the discussion in much detail, so forgive me
>>> if this has come up. But I think this approach is also similar to
>>> thinking behind missing values in SAS and "extended" missing values in
>>> Stata. They are missing but preserve an order. This way you can pull
>>> out values that are missing because they were eaten by a dog and see
>>> if these missing ones are systematically different than the ones that
>>> are missing because they're too lazy. Use case that pops to mind,
>>> seeing if the various ways of attrition in surveys or experiments
>>> varies in a non-random way.
>>>
>>>
>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm
>>> http://www.stata.com/help.cgi?missing
>>
>>
>> That's interesting, and I see that they use a numerical ordering for the
>> different NA values. I think if instead of using the AND operator to combine
>> masks, we use MINIMUM, this behavior would happen naturally with almost no
>> additional work. Then, in addition to np.NA and np.NA(dtype), it could allow
>> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default.
>>
>
> Sorry, my brain is a bit addled by all these comments. This idea would also
> require flipping the mask so 0 is unmasked. and 1 to 255 is masked as
> Christopher pointed out in a different thread.
>

Or you could subtract instead of add and use maximum instead of minimum. I
thought those details would be hidden.

Chuck

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] An NA compromise idea -- many-NA

Reply via email to