On Sat, Jun 25, 2011 at 9:21 AM, Charles R Harris <[email protected] > wrote:
> On Sat, Jun 25, 2011 at 5:29 AM, Pierre GM <[email protected]> wrote: > >> This thread is getting quite long, innit ? >> And I think it's getting a tad confusing, because we're mixing two >> different concepts: missing values and masks. >> There should be support for missing values in numpy.core, I think we all >> agree on that. >> * What's been suggested of adding new dtypes (nafloat, naint) is great, by >> why not making it the default, then ? >> > * Operations involving a NA (whatever the NA actually is, depending on the >> dtype of the input) should result in a NA (whatever the NA defined by the >> outputs dtype). That could be done by overloading the existing ufuncs to >> support the new dtypes. >> * There should be some simple methods to retrieve the location of those >> NAs in an array. Whether we just output the indices or a full boolean array >> (w/ True for a NA, False for a non-NA or vice-versa) needs to be decided. >> * We can always re-implement masked arrays to use these NAs in a way which >> would be consistent with numpy.ma (so as not to confuse existing users of >> numpy.ma): a mask would be a boolean array with the same shape than the >> underlying ndarray, with True for NA. >> Mark, I'd suggest you modify your proposal, making it clearer that it's >> not to add all of numpy.ma functionalities in the core, but just support >> these missing values. Using the term 'mask' should be avoided as much as >> possible, use a 'missing data' or whatever. >> > > I think he aims to support both. One complication with masks is keeping > them tied to the data on disk. With na values one file can contain both the > data and the missing data markers, whereas with masks, two files would be > required. I don't think that will fly in the long run unless there is some > standard file format, like geotiff for GIS, that combines both. > Before I was leaning mostly towards masks, but now that I've come up with an NA bit pattern approach that feels reasonable, I think implementing both together is on the table. Bringing up the file format issue is good, that hasn't been covered in the NEP yet. -Mark > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
