I think the most important thing that we need to decide on is: Do we treat subregions of a dataset as a single object. As in >>> aSubregionedDataset = DatasetProcessor.subregion(someSubregions, aDataset)
In this case, what would aSubregionedDataset look like? Would it have a list of lat lists? One list for each subregion that was taken from the dataset? or is a subregion effectively just a dataset? As in >>> [firstDataset, secondDataset, ..., nthDataset] = DatasetProcessor.subregion(someNSubregions, aDataset) If we go with the second approach, I don't see the point in making a distinction. If a subregion is really just a subset of a dataset then there's not purpose in separating the two. The user should be responsible for properly grouping datasets (that happen to be subregions) into the Evaluation and passing them in the expected grouping for plotting. In this case, the Evaluation object doesn't treat a subregion differently at all. It's just a Dataset that get's run through everything like normal. If the only purpose for making a special distinction between a subregion and a dataset is for grouping convenience then we really need to ask ourselves if the user should be responsible for handling the grouping so we can simplify the system. Personally, I think the user should be responsible for this work. However, that's only because as far as I can tell a subregion is just a Dataset with an adjusted bounding box. Perhaps I'm oversimplifying. -- Joyce On Tue, Jul 30, 2013 at 8:14 AM, Cameron Goodale <[email protected]> wrote: > I think option 3 is the best given the rationale that has been stated > previously. > > I can add a function to the dataset_processor module that will take in a > single Dataset Object and a list of SubRegion Specifications (north, south, > east, west, Name), and it could return a tuple of SubRegion objects with a > length equal to the number of SubRegion Specs. > > SubClassing Dataset makes sense because a Dataset and SubRegion share > common attributes, but after talking with Mike about the two, can a future > science user please give me a clear difference between a Dataset and a > SubRegion? > > I hope a SubRegion assumes specific Metrics to be run, that cannot be run > on a Dataset. I fear if SubRegion and Dataset are too similar it will > merely confuse users (and software engineers) about when to use which one. > > Can anyone articulate the difference between a Dataset and SubRegion for > me? > > > Thanks, > > > Cameron > > > On Mon, Jul 29, 2013 at 12:22 PM, Michael Joyce <[email protected]> wrote: > > > You covered most everything Alex. > > > > I'm a fan of inheriting from Dataset to handle Subregions. The user can > > still add the "dataset" the same way to an Evaluation. Then the > Evaluation > > instance can run a separate eval loop to handle subregions. It makes > > Evaluation more complicated but using naming convention to designate a > > subregion will just be worse I feel. The DatasetProcessor could have a > > function that takes a Dataset and subregion information and spits out a > new > > SubregionDataset (or some such meaningful name) instance that the user > can > > add to the Evaluation. > > > > What does everyone think would be a good way of handling this? > > > > > > -- Joyce > > > > > > On Mon, Jul 29, 2013 at 11:36 AM, Goodman, Alexander (398J-Affiliate) < > > [email protected]> wrote: > > > > > Hi all, > > > > > > Being able to account for subregions will be a crucial part of running > an > > > evaluation and making the right plots as part of our OCW refactoring. > > Mike > > > and I had a discussion last Friday on some ways to do this and we both > > > thought that the best approach would make use of the Dataset class > > somehow. > > > Some specific ideas we had include: > > > > > > 1) Designate datasets as subregional by convention. Specifically, this > > > could be something like making a new dataset instance with the same > name > > as > > > the parent dataset but with the subregion name appended to the end > with a > > > leading underscore (eg name_R01, name_R02). > > > > > > 2) Values for a particular subregion could placed in a list or > dictionary > > > as an attribute of Dataset. > > > > > > 3) Make a subclass of Dataset explicitly for subregions. > > > > > > In general, any approach will add an additional complication to some > > > component of the new OCW code in that the evaluation results / datasets > > > need to get grouped together by subregion. > > > > > > My preferred approach is (3) since it adds the least amount of > > complication > > > to the plotting. I particularly don't like (1) since enforcing a rule > by > > > convention would add restrictions to users on valid names for datasets, > > for > > > example a dataset name like 'TRMM_hourly_precip' would make it > difficult > > to > > > incorporate subregions. > > > > > > Mike, my memory since our last meeting is a bit fuzzy so please clarify > > or > > > correct any of my points if I am wrong here. I would like to hear other > > > ideas or opinions as to the best approach for the subregion problem. > > > > > > Thanks, > > > Alex > > > > > > -- > > > Alex Goodman > > > > > >
