"Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good sense?" --> these can be taken as 'standard terminology' except 'plotter'.
----------------------------------------------------------------------------------------------------- Jinwon Kim Dept. Atmospheric and Oceanic Sciences and Joint Institute for Regional Earth System Science and Engineering University of California, Los Angeles Los Angeles, CA 90095-1565 ________________________________________ From: [email protected] [[email protected]] on behalf of Cameron Goodale [[email protected]] Sent: Wednesday, June 12, 2013 10:10 PM To: [email protected] Subject: Re: Proposed Toolkit Refactoring I have to agree with Mike on this one, but reserve the right to change my mind later ;) It is a delicate balance between organizing code that is maintainable, decoupled, and still retains an API that is easy for humans to read and understand. I will always favor direct and concise names over fuzzy or ambiguous ones. Naming is hard, period. I don't like the misc.Dataset.py that feels clunky and your don't get much more fuzzy than 'misc', so we should get that cleaned up. Thank you Mazi for creating the wiki page so we can all visually see the code structure, this is a big help to the project. I would like to invite any of the science users and/or devs to weigh in on this. If the resulting API doesn't make sense to the end users, then we have failed (in my opinion). Question for NON-Computer Scientists: Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good sense? -Cameron On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote: > I find the new structuring (A) to be a bit confusing. I think the part I > find confusing is the 'data' part. Combining Dataset, DatasetProcessor, and > DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if I > can give some examples. > > ocw.data.content is where I assume that Dataset is defined? The naming > doesn't really imply this to me. "content" is a bit ambiguous. > > ocw.data.processing is fine but personally I find ocw.dataset_processor to > be more clear. Makes it seem like you have this object (DatasetProcessor) > that let's you "process datasets". The first one says to me "O, I can > process data...what does that mean?". > > In my opinion, data_source.rcmed.getDataset() is more understandable than > data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a > datasource from which you can get a dataset. I think the second one does > this as well but not as clearly as the first. This could certainly be fixed > by changing some of the naming (Maybe data.sources.rcmed.getDataset()) but > then why bother with the extra level of nesting if it's not doing > much/anything? > > Lastly, why have Plot.plotting.py? That extra directory > doesn't accomplish anything outside of adding another level of nesting. If > we plan on adding more 'Plot' related modules then I would say go for it, > but Plot.plotting seems unnecessarily redundant given that plotting.py is > the only module in Plot. > -- > > I will say that I'm not completely sold on misc.dataset for defining the > "Dataset" class. Other than that I prefer the structuring we came up with > Monday over the new one. I don't feel that the new structuring helps with > the Dataset problem enough to warrant the changes it makes elsewhere. > > > -- Joyce > > > On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) < > [email protected]> wrote: > > > Hi All, > > > > Monday this week Cam, Mike and me had a 30 min talk about the refactoring > > RCMES code and coming with a code structure. > > On the wiki page [1] there are two code structures , structure A and B. > > Structure (B) was the one we came up with on Mondays talk. > > Since then I was trying to make some improvements on that and came up > with > > structure (A). > > Most of the improvements were on trying to make naming more easy > > understudying for user and a simpler structure. > > For example: > > (B) > (A) > > misc.Dataset = Data.content > > DataSource.local = Data.retrieve.local > > DatasetProcessor = Data.process > > > > Here are some "import" examples we can have with the new structure: > > import Data.content > > import Datat.process > > import Data.retrieve.local > > import Data.retrieve.rcmed > > > > The Review Board for these python codes will come up soon. > > > > Thoughts? > > > > Best, > > Mazi > > > > [1]: > > > https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbench+API+summary > > > > > > > > > > On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote: > > > > > Hi All, > > > > > > Regarding to the RCMES refactoring API codes (Toolkit), I thought is > > good to have a wiki page that summarize the API we are going to have for > > RCMES in future. > > > This is not the actual document we will have later for RCMES code, but > > it just the list of classes, modules, methods and functions we may need > to > > develop. > > > It would be great if you guys help me to complete this wiki before we > > start the refactoring toolkit's codes. > > > > > > > > > https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbench+API+summary > > > > > > Best, > > > Mazi > > > > > > > > > On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote: > > > > > >> +1 for cutting 0.1-incubating and starting these changes in 0.2. > > >> > > >> > > >> -- Joyce > > >> > > >> > > >> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) < > > >> [email protected]> wrote: > > >> > > >>> This sounds like a good path to proceed down to me. > > >>> > > >>> I would formulate the below into a set of JIRA issues, > > >>> then proceed by incrementally evolving the toolkit to > > >>> support this. > > >>> > > >>> The only catch is that many of these could potentially > > >>> be API back compat. Since we haven't really talked or > > >>> suggested about the impact of this; nee made a release, > > >>> it's certainly possible to do this in trunk. > > >>> > > >>> My suggestion though since trunk represents what we > > >>> all believe to be RCMET 2.0 API compat, we should probably > > >>> create a branch for this. Or, better yet: > > >>> > > >>> 1. Close out current JIRA issues for 0.1-incubating. > > >>> 2. Cut a 0.1-incubating RC/release process. > > >>> 3. Start to implement the below in 0.2-incubating. > > >>> > > >>> Thoughts? > > >>> > > >>> Cheers, > > >>> Chris > > >>> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> Chris Mattmann, Ph.D. > > >>> Senior Computer Scientist > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > >>> Office: 171-266B, Mailstop: 171-246 > > >>> Email: [email protected] > > >>> WWW: http://sunset.usc.edu/~mattmann/ > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> Adjunct Assistant Professor, Computer Science Department > > >>> University of Southern California, Los Angeles, CA 90089 USA > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> -----Original Message----- > > >>> From: Michael Joyce <[email protected]> > > >>> Reply-To: "[email protected]" > > >>> <[email protected]> > > >>> Date: Wednesday, June 5, 2013 6:56 AM > > >>> To: dev <[email protected]> > > >>> Subject: Proposed Toolkit Refactoring > > >>> > > >>>> All, > > >>>> > > >>>> This is a brief rundown of a discussion that Paul, Cam, Mazi, and I > > had > > >>>> yesterday regarding the current state of the toolkit and proposed > > changes > > >>>> that we would like to discuss with the list. > > >>>> > > >>>> We discussed adding a number of objects that should help simplify > > toolkit > > >>>> usage. Below is a high-level rundown of our discussion. > > >>>> > > >>>> -- > > >>>> > > >>>> Dataset: Simple container object for a dataset. Provides helpers for > > >>>> accessing relevant data (getLatsLons, getTime) and convenience > > functions > > >>>> (writeToFile()). > > >>>> > > >>>> DataSource: Provides the user with helper functions for grabbing the > > data > > >>>> that they want to evaluate. There's a RCMED module specifically for > > >>>> grabbing RCMED data and a Local module for grabbing local data. This > > could > > >>>> easily be expanded to include ESG and other data sources. > > >>>> > > >>>> DatasetProcessor: Any operation that needs to be run on datasets > (that > > >>>> isn't the evaluation obviously) is found in the DatasetProcessor. It > > >>>> supports: > > >>>> - regridding (spatial and temporal) > > >>>> - masking/cleaning/filtering > > >>>> - subsetting (spatial and temporal) > > >>>> - ensemble generation > > >>>> - anything else that fits here. > > >>>> > > >>>> Evaluation: The Evaluation object is (surprise surprise) in charge > of > > >>>> running Evaluations. It keeps track of the datasets (both > 'reference' > > and > > >>>> the 'targets') that the user wants to use in the evaluation. It runs > > all > > >>>> the necessary evaluations and keeps the results nicely stored and > > readily > > >>>> accessible for the user. > > >>>> > > >>>> Metric: Metrics are added to an Evaluation and used during the run. > > All > > >>>> metrics inherit from the base Metric class. All you need to add new > > >>>> metrics > > >>>> is inherit from Metric and override the 'run' method. > > >>>> > > >>>> Plotter: The Plotter makes result visualization a breeze. If you > give > > it > > >>>> an > > >>>> Evaluation object it will spit > > >>>> out plots of all the results. Give it a Dataset and it will spit > out a > > >>>> plot. You can even have it return > > >>>> Matplotlib objects so you can make your results look exactly the way > > you'd > > >>>> like. > > >>>> > > >>>> -- Joyce > > >>> > > >>> > > > > > > > >
