Re: data_processor..mask_missing_data - What is the definition of a missing value?

Goodman, Alexander (398K) Sun, 26 Nov 2017 16:33:41 -0800

Hi Michael,

The purpose of that function is to propagate missing data from one dataset
to all others so that the evaluation is consistent. Depending on what
loader is used, datasets without missing data do not use masked arrays for
the value attribute. (See CLIMATE-819). I would just modify the function to
simply continue on through the loop if the current dataset's values
attribute is not masked. While requiring masked arrays as dataset values
for all ocw workflows might make it easier to avoid these types of
situations, I do think it's a bad idea since numpy's masked arrays are
known to have a significant performance overhead (unless this was improved
signficiantly in more recent releases), so we shouldn't force users to mask
arrays which do not in fact contain missing data.


Thanks,
Alex

On Sun, Nov 26, 2017 at 3:40 PM, Michael Anderson <
michael.arthur.ander...@gmail.com> wrote:

> Apologies, the method is in data_process, not utils, as incorrectly stated
> in the header of the previous email.
>
> On Sun, Nov 26, 2017 at 6:29 PM, Michael Anderson <
> michael.arthur.ander...@gmail.com> wrote:
>
> > I'm working on https://issues.apache.org/jira/browse/CLIMATE-797.
> >
> > The comments in the method in question state:
> >
> > If any of dataset in dataset_array has missing values at a grid point,
> > the values at the grid point in all other datasets are masked.
> >
> >
> > The problem here is that the method assumes a masked array is passed as
> an input.
> >
> > If a regular numpy array (e.g. OCW dataset) is passed, it does not have
> a mask attribute and an error is thrown
> >
> >
> > 1.  I could tidy up the error handling to make it more clear to the
> caller that a masked array was expected.
> >
> >
> > 2.  I could check if a mask exists and use that.  In the case of the
> mask not being supplied, I could carry out the intent of the function and
> manually check the array for "missing values".  Other than None or NaN, are
> there any other values that by convention constitute missing?  The netCDF
> default fill values?
> >
> >
> > Preferences on the approach and / or suggestions on the second approach?
> >
> >
> > Thanks,
> >
> >
> > Michael A. Anderson
> >
> >
> >
> >
> >
>



-- 
Alex Goodman
Data Scientist I
Science Data Modeling and Computing (398K)
Jet Propulsion Laboratory
California Institute of Technology
Tel: +1-818-354-6012

Re: data_processor..mask_missing_data - What is the definition of a missing value?

Reply via email to