On Thu, Jun 23, 2011 at 4:54 PM, Eric Firing <efir...@hawaii.edu> wrote:
> On 06/23/2011 11:19 AM, Nathaniel Smith wrote: > > I'd like to see a statement of what the "missing data problem" is, and > > how this solves it? Because I don't think this is entirely intuitive, > > or that everyone necessarily has the same idea. > > > >> Reduction operations like 'sum', 'prod', 'min', and 'max' will operate > as if the values weren't there > > > > For context: My experience with missing data is in statistical > > analysis; I find R's NA support to be pretty awesome for those > > purposes. The conceptual model it's based on is that an NA value is > > some number that we just happen not to know. So from this perspective, > > I find it pretty confusing that adding an unknown quantity to 3 should > > result in 3, rather than another unknown quantity. (Obviously it > > should be possible to compute the sum of the known values, but IME > > it's important for the default behavior to be to fail loudly when > > things are wonky, not to silently patch them up, possibly > > incorrectly!) > > From the oceanographic data acquisition and analysis perspective, and > perhaps from a more general plotting perspective (matplotlib, > specifically) missing data is simply missing; we don't have it, we never > will, but we need to do the best calculation (or plot) we can with what > is left. For plotting, that generally means showing a gap in a line, a > hole in a contour plot, etc. For calculations like basic statistics, it > means doing the calculation, e.g. a mean, with the available numbers, > *and* having an easy way to find out how many numbers were available. > That's what the masked array count() method is for. > I'm thinking a parameter for sum, mean, etc which enables this interpretation is a good approach for these calculations. Some types of calculations, like the FFT, simply can't be done by > ignoring missing values, so one must first use some filling method, > perhaps interpolation, for example, and then pass an unmasked array to > the function. > These kinds of functions will have to raise exceptions when called on an array which has an masked value, true. > > The present masked array module is very close to what is really needed > for the sorts of things I am involved with. It looks to me like the > main deficiencies are addressed by Mark's proposal, although the change > in the definition of the mask might make for a painful transition. > Yeah, I understand the pain, but I'd much prefer to align with the general consensus about masks elsewhere than stick with the current convention. -Mark > > Eric > > > > > Also, what should 'dot' do with missing values? > > > > -- Nathaniel > > > > On Thu, Jun 23, 2011 at 1:53 PM, Mark Wiebe<mwwi...@gmail.com> wrote: > >> Enthought has asked me to look into the "missing data" problem and how > NumPy > >> could treat it better. I've considered the different ideas of adding > dtype > >> variants with a special signal value and masked arrays, and concluded > that > >> adding masks to the core ndarray appears is the best way to deal with > the > >> problem in general. > >> I've written a NEP that proposes a particular design, viewable here: > >> > https://github.com/m-paradox/numpy/blob/cmaskedarray/doc/neps/c-masked-array.rst > >> There are some questions at the bottom of the NEP which definitely need > >> discussion to find the best design choices. Please read, and let me know > of > >> all the errors and gaps you find in the document. > >> Thanks, > >> Mark > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion