Clearly there are some overlaps between what masked arrays are
      trying to achieve and what Rs NA mechanisms are trying to achieve.
       Are they really similar enough that they should function using
      the same API?

Yes.

      And if so, won't that be confusing?

No, I don't believe so, any more than NA's in R, NaN's, or Inf's are already
confusing.

As one who's been silently following (most of) this thread, and a heavy R and numpy user, perhaps I should chime in briefly here with a use case. I more-or-less always work with partially masked data, like Matthew, but not numpy masked arrays because the memory overhead is prohibitive. And, sad to say, my experiments don't always go perfectly. I therefore have arrays in which there is /both/ (1) data that is simply missing (np.NA?)--it never had a value and never will--as well as simultaneously (2) data that that is temporarily masked (np.IGNORE? np.MASKED?) where I want to mask/unmask different portions for different purposes/analyses. I consider these two separate, completely independent issues and I unfortunately currently have to kluge a lot to handle this.

Concretely, consider a list of 100,000 observations (rows), with 12 measures per observation-row (a 100,000 x 12 array). Every now and then, sprinkled throughout this array, I have missing values (someone didn't answer a question, or a computer failed to record a response, or whatever). For some analyses I want to mask the whole row (e.g., complete-case analysis), leaving me with array entries that should be tagged with all 4 possible labels:

1) not masked, not missing
2) masked, not missing
3) not masked, missing
4) masked, missing

Obviously #4 is "overkill" ... but only until I want to unmask that row. At that point, I need to be sure that missing values remain missing when unmasked. Can a single API really handle this?

-best
Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to