RE: OCW Refactoring and Subregions

Kim, Jinwon Tue, 30 Jul 2013 11:02:10 -0700

Please note:

(1) "The assumption so far has been that the datasets in the evaluation 
perfectly overlap both spatially and temporally"
      >>> there is an exception. if "spatialGrid=user" in the config file, the 
evaluation domain is specified by the user and both model/obs data are 
interpolated onto the user-specified domain.
(2) "The assumption so far has been that the datasets in the evaluation 
perfectly overlap both spatially and temporally"
      >>> for temporally, there is a process to check data time steps of 
individual data files (both for model and obs). If time steps are mixed (e.g., 
some in daily and some in monthly), the code temporally regrids the daily data 
into monthly (interpolation in time is not allowed currently).


I am using exclusively the user-specified domain option to avoid a problem in 
model data set; although SMHI (also NCAR) interpolated model data onto the same 
domain, the long,lat values in individual netCDF files can vary because of 
truncation error. This problem occurred both in the cordex africa (model 
dataset prepared by SMHI) and narccap (prepared by NCAR).



-----------------------------------------------------------------------------------------------------
Jinwon Kim
Dept. Atmospheric and Oceanic Sciences and
Joint Institute for Regional Earth System Science and Engineering
University of California, Los Angeles
Los Angeles, CA 90095-1565
________________________________________
From: [email protected] [[email protected]] on behalf of Michael Joyce 
[[email protected]]
Sent: Tuesday, July 30, 2013 10:34 AM
To: dev
Subject: Re: OCW Refactoring and Subregions

Had a meeting with Paul. Here's the results.

Currently we haven't specified any region information in the pre-eval/eval
step. The assumption so far has been that the datasets in the evaluation
perfectly overlap both spatially and temporally, but there's been no check
for compliance or a prerequisite function in DatasetProcessor (DSP) that
does this operation (at least not that I'm aware of). An Evaluation is only
valid if the Datasets overlap perfectly both spatially and temporally. So
DSP needs a function that takes all the datasets and some bounding
information and spits out datasets with the correct overlaps (or an error
if the request isn't possible). To handle Subregions, the Evaluation
objectal will take an option subregions object that changes how the
evaluation is run.

If there aren't any subregions we end up with:

>>> results =
>>> [ # For a metric
>>> .... [ # For a target dataset
>>> .... .... # Results for the evaluation with the reference dataset
>>> .... ]
>>> ]

If there are subregions we end up with:
>>> results =
>>> [ # For a metric
>>> .... [ # For a target dataset
>>> .... .... [ # For a subregion
>>> .... .... .... # Result for a subregion of the target dataset with the
reference dataset
>>> .... .... ]
>>> .... ]
>>> ]

This means that we need to change a few things. Here's some pseudo-code
showing this:

Model-to-obs (no subregions)

>>> model = local.load("some/fake/path")
>>> obs = rcmed.getDataset(someParamIdOrWhateverWeUseToGrabObservations)
>>>
>>> DSP.regrid(model, obs)  # I'm not sure what the exact format that we
use for this, but you get the idea)
>>> model, obs = DSP.subset(evalRegion, [model, obs]) # Here evalRegion
contains the spatial/temporal bounds for the evaluation
>>>
>>> eval = Evaluation(model, obs, Bias()) # Reference dataset, target
dataset, and metric(s)
>>> eval.run()

Model-to-obs (with subregions)

Here we add subregions. A subregion is effectively a spatial bound to run
on the evaluation. We don't necessarily need to create a new class for
this, but we need to agree on a way of passing this information. I think
the best way to handle this is that each subregion is a list of [latMin,
lonMin, latMax, lonMax]. This gives us:

subregionBounds = [[latMin, lonMin, latMax, lonMax], [latMin, lonMin,
latMax, lonMax], ...]

The evaluation is then run over each subregion. For a subregion to be valid
it must be a subset of the evalRegion that the Evaluation is run over.

>>> model = local.load("some/fake/path")
>>> obs = rcmed.getDataset(someParamIdOrWhateverWeUseToGrabObservations)
>>>
>>> DSP.regrid(model, obs)  # I'm not sure what the exact format that we
use for this, but you get the idea)
>>> model, obs = DSP.subset(evalRegion, [model, obs]) # Here
subsetInformation contains the spatial/temporal bounds for the evaluation
>>>
>>> eval = Evaluation(model, obs, Bias(), subregionBounds) # Reference
dataset, target dataset, metric(s), subregion bounds
>>> eval.run()

When doing a mutli-dataset evaluation the calls change just a bit

>>> DSP.regrid(model, targetDatasets)
>>> model, targetDatasets = DSP.subset(evalRegion, model + targetDatasets)
>>>
>>> eval = Evaluation(model, targetDatasets, someListOfModels,
subregionBounds)
>>> eval.run()

thoughts?


-- Joyce


On Tue, Jul 30, 2013 at 8:25 AM, Michael Joyce <[email protected]> wrote:

> I think the most important thing that we need to decide on is:
>
> Do we treat subregions of a dataset as a single object. As in
> >>> aSubregionedDataset = DatasetProcessor.subregion(someSubregions,
> aDataset)
>
> In this case, what would aSubregionedDataset look like? Would it have a
> list of lat lists? One list for each subregion that was taken from the
> dataset?
>
> or is a subregion effectively just a dataset? As in
>
> >>> [firstDataset, secondDataset, ..., nthDataset] =
> DatasetProcessor.subregion(someNSubregions, aDataset)
>
> If we go with the second approach, I don't see the point in making a
> distinction. If a subregion is really just a subset of a dataset then
> there's not purpose in separating the two. The user should be responsible
> for properly grouping datasets (that happen to be subregions) into the
> Evaluation and passing them in the expected grouping for plotting. In this
> case, the Evaluation object doesn't treat a subregion differently at all.
> It's just a Dataset that get's run through everything like normal.
>
> If the only purpose for making a special distinction between a subregion
> and a dataset is for grouping convenience then we really need to ask
> ourselves if the user should be responsible for handling the grouping so we
> can simplify the system. Personally, I think the user should be responsible
> for this work. However, that's only because as far as I can tell a
> subregion is just a Dataset with an adjusted bounding box. Perhaps I'm
> oversimplifying.
>
>
> -- Joyce
>
>
> On Tue, Jul 30, 2013 at 8:14 AM, Cameron Goodale <[email protected]>wrote:
>
>> I think option 3 is the best given the rationale that has been stated
>> previously.
>>
>> I can add a function to the dataset_processor module that will take in a
>> single Dataset Object and a list of SubRegion Specifications (north,
>> south,
>> east, west, Name), and it could return a tuple of SubRegion objects with a
>> length equal to the number of SubRegion Specs.
>>
>> SubClassing Dataset makes sense because a Dataset and SubRegion share
>> common attributes, but after talking with Mike about the two, can a future
>> science user please give me a clear difference between a Dataset and a
>> SubRegion?
>>
>> I hope a SubRegion assumes specific Metrics to be run, that cannot be run
>> on a Dataset.  I fear if SubRegion and Dataset are too similar it will
>> merely confuse users (and software engineers) about when to use which one.
>>
>> Can anyone articulate the difference between a Dataset and SubRegion for
>> me?
>>
>>
>> Thanks,
>>
>>
>> Cameron
>>
>>
>> On Mon, Jul 29, 2013 at 12:22 PM, Michael Joyce <[email protected]> wrote:
>>
>> > You covered most everything Alex.
>> >
>> > I'm a fan of inheriting from Dataset to handle Subregions. The user can
>> > still add the "dataset" the same way to an Evaluation. Then the
>> Evaluation
>> > instance can run a separate eval loop to handle subregions. It makes
>> > Evaluation more complicated but using naming convention to designate a
>> > subregion will just be worse I feel. The DatasetProcessor could have a
>> > function that takes a Dataset and subregion information and spits out a
>> new
>> > SubregionDataset (or some such meaningful name) instance that the user
>> can
>> > add to the Evaluation.
>> >
>> > What does everyone think would be a good way of handling this?
>> >
>> >
>> > -- Joyce
>> >
>> >
>> > On Mon, Jul 29, 2013 at 11:36 AM, Goodman, Alexander (398J-Affiliate) <
>> > [email protected]> wrote:
>> >
>> > > Hi all,
>> > >
>> > > Being able to account for subregions will be a crucial part of
>> running an
>> > > evaluation and making the right plots as part of our OCW refactoring.
>> > Mike
>> > > and I had a discussion last Friday on some ways to do this and we both
>> > > thought that the best approach would make use of the Dataset class
>> > somehow.
>> > > Some specific ideas we had include:
>> > >
>> > > 1) Designate datasets as subregional by convention. Specifically, this
>> > > could be something like making a new dataset instance with the same
>> name
>> > as
>> > > the parent dataset but with the subregion name appended to the end
>> with a
>> > > leading underscore (eg name_R01, name_R02).
>> > >
>> > > 2) Values for a particular subregion could placed in a list or
>> dictionary
>> > > as an attribute of Dataset.
>> > >
>> > > 3) Make a subclass of Dataset explicitly for subregions.
>> > >
>> > > In general, any approach will add an additional complication to some
>> > > component of the new OCW code in that the evaluation results /
>> datasets
>> > > need to get grouped together by subregion.
>> > >
>> > > My preferred approach is (3) since it adds the least amount of
>> > complication
>> > > to the plotting. I particularly don't like (1) since enforcing a rule
>> by
>> > > convention would add restrictions to users on valid names for
>> datasets,
>> > for
>> > > example a dataset name like 'TRMM_hourly_precip' would make it
>> difficult
>> > to
>> > > incorporate subregions.
>> > >
>> > > Mike, my memory since our last meeting is a bit fuzzy so please
>> clarify
>> > or
>> > > correct any of my points if I am wrong here. I would like to hear
>> other
>> > > ideas or opinions as to the best approach for the subregion problem.
>> > >
>> > > Thanks,
>> > > Alex
>> > >
>> > > --
>> > > Alex Goodman
>> > >
>> >
>>
>
>

RE: OCW Refactoring and Subregions

Reply via email to