Hi Michael, The purpose of that function is to propagate missing data from one dataset to all others so that the evaluation is consistent. Depending on what loader is used, datasets without missing data do not use masked arrays for the value attribute. (See CLIMATE-819). I would just modify the function to simply continue on through the loop if the current dataset's values attribute is not masked. While requiring masked arrays as dataset values for all ocw workflows might make it easier to avoid these types of situations, I do think it's a bad idea since numpy's masked arrays are known to have a significant performance overhead (unless this was improved signficiantly in more recent releases), so we shouldn't force users to mask arrays which do not in fact contain missing data.
Thanks, Alex On Sun, Nov 26, 2017 at 3:40 PM, Michael Anderson < michael.arthur.ander...@gmail.com> wrote: > Apologies, the method is in data_process, not utils, as incorrectly stated > in the header of the previous email. > > On Sun, Nov 26, 2017 at 6:29 PM, Michael Anderson < > michael.arthur.ander...@gmail.com> wrote: > > > I'm working on https://issues.apache.org/jira/browse/CLIMATE-797. > > > > The comments in the method in question state: > > > > If any of dataset in dataset_array has missing values at a grid point, > > the values at the grid point in all other datasets are masked. > > > > > > The problem here is that the method assumes a masked array is passed as > an input. > > > > If a regular numpy array (e.g. OCW dataset) is passed, it does not have > a mask attribute and an error is thrown > > > > > > 1. I could tidy up the error handling to make it more clear to the > caller that a masked array was expected. > > > > > > 2. I could check if a mask exists and use that. In the case of the > mask not being supplied, I could carry out the intent of the function and > manually check the array for "missing values". Other than None or NaN, are > there any other values that by convention constitute missing? The netCDF > default fill values? > > > > > > Preferences on the approach and / or suggestions on the second approach? > > > > > > Thanks, > > > > > > Michael A. Anderson > > > > > > > > > > > -- Alex Goodman Data Scientist I Science Data Modeling and Computing (398K) Jet Propulsion Laboratory California Institute of Technology Tel: +1-818-354-6012