Hi All, 1- Thanks for helping on naming and structure of OCW. 2- I always would like to generate a flow-chart type of diagram for better and easy understanding of API structure, please check [1] for updated version. 3- I will start coding one both "dataset.py" and "local.py" and will send the review board soon.
[1]: https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbench+API+summary Best, Mazi On Jun 14, 2013, at 9:00 AM, Ramirez, Paul M (398J) wrote: > > > Sent from my iPhone > > On Jun 14, 2013, at 5:29 AM, "Michael Joyce" <[email protected]> wrote: > >> Paul, >> >> io vs data_source - I would stick with data_source. Much clearer in my >> opinion. >> dataset_processor - still needs to be in there >> >> Regarding Evaluation and Metrics. They're interrelated and should be in the >> same package in my opinion. I also dislike storing a ton of metric >> functions in metrics. I much prefer the class formatting that we discussed >> at our last meeting (perhaps I'm just misreading this part though). >> >> plotter vs display vs visualization: I like visualization the best >> personally. >> >> As a side note we need to be careful how we name files and various classes. >> Otherwise our imports are going to be annoying and redundant. Say we want >> to get the Metric class from the metric file. We end up with >> ocw.metric.metric which is just ugly. > > Don't think this is true if you do an import of the class in the __init__.py. > > Also I know we had discussion of Metrics as a class but we don't config them > now and if we did it could be easier to just pass the metric config to the > Evaluation. What we need to do is look what config we would even have in the > current metrics and balance with how many knobs are we trying to put in a > priori. Nobs can be added when requested. > > >> >> -- Joyce >> >> >> On Thu, Jun 13, 2013 at 7:40 AM, Ramirez, Paul M (398J) < >> [email protected]> wrote: >> >>> All, >>> >>> What about instead of Plotter we collocate the plots into a Display class >>> or module? >>> >>> Agree with Mike on the extra fluff around packages not adding and >>> therefore should be dropped. >>> >>> >>> rcmed = new RCMED(database_info) >>> obs = rcmed.loadObservation(key) >>> model = local.loadModel(filepath) >>> metric = [metrics.bias, metrics.pdf] >>> evaluation = new Evaluation(obs, model, metric) >>> results = evaluation.run() >>> >>> Maybe I'm simplifying this too much but wouldn't the following suffice? >>> >>> ocw >>> ├── __init__.py >>> ├── dataset.py >>> ├── display.py >>> ├── evaluate.py >>> ├── io >>> │ ├── __init.py__ >>> │ ├── esg.py >>> │ ├── local.py >>> │ └── rcmed.py >>> └── metrics.py >>> >>> >>> >>> --Paul >>> >>> >>> >>> On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote: >>> >>>> >>>> "Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good >>>> sense?" >>>> --> these can be taken as 'standard terminology' except 'plotter'. >>>> >>>> -------------------------------------------------------------------------- >>>> --------------------------- >>>> Jinwon Kim >>>> Dept. Atmospheric and Oceanic Sciences and >>>> Joint Institute for Regional Earth System Science and Engineering >>>> University of California, Los Angeles >>>> Los Angeles, CA 90095-1565 >>>> ________________________________________ >>>> From: [email protected] [[email protected]] on behalf of Cameron >>>> Goodale [[email protected]] >>>> Sent: Wednesday, June 12, 2013 10:10 PM >>>> To: [email protected] >>>> Subject: Re: Proposed Toolkit Refactoring >>>> >>>> I have to agree with Mike on this one, but reserve the right to change my >>>> mind later ;) >>>> >>>> It is a delicate balance between organizing code that is maintainable, >>>> decoupled, and still retains an API that is easy for humans to read and >>>> understand. I will always favor direct and concise names over fuzzy or >>>> ambiguous ones. Naming is hard, period. >>>> >>>> I don't like the misc.Dataset.py that feels clunky and your don't get >>> much >>>> more fuzzy than 'misc', so we should get that cleaned up. >>>> >>>> Thank you Mazi for creating the wiki page so we can all visually see the >>>> code structure, this is a big help to the project. >>>> >>>> I would like to invite any of the science users and/or devs to weigh in on >>>> this. If the resulting API doesn't make sense to the end users, then we >>>> have failed (in my opinion). >>>> >>>> Question for NON-Computer Scientists: >>>> Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good >>>> sense? >>>> >>>> >>>> >>>> >>>> -Cameron >>>> >>>> >>>> On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote: >>>> >>>>> I find the new structuring (A) to be a bit confusing. I think the part I >>>>> find confusing is the 'data' part. Combining Dataset, DatasetProcessor, >>>>> and >>>>> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if >>>>> I >>>>> can give some examples. >>>>> >>>>> ocw.data.content is where I assume that Dataset is defined? The naming >>>>> doesn't really imply this to me. "content" is a bit ambiguous. >>>>> >>>>> ocw.data.processing is fine but personally I find ocw.dataset_processor >>>>> to >>>>> be more clear. Makes it seem like you have this object >>>>> (DatasetProcessor) >>>>> that let's you "process datasets". The first one says to me "O, I can >>>>> process data...what does that mean?". >>>>> >>>>> In my opinion, data_source.rcmed.getDataset() is more understandable >>>>> than >>>>> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a >>>>> datasource from which you can get a dataset. I think the second one does >>>>> this as well but not as clearly as the first. This could certainly be >>>>> fixed >>>>> by changing some of the naming (Maybe data.sources.rcmed.getDataset()) >>>>> but >>>>> then why bother with the extra level of nesting if it's not doing >>>>> much/anything? >>>>> >>>>> Lastly, why have Plot.plotting.py? That extra directory >>>>> doesn't accomplish anything outside of adding another level of nesting. >>>>> If >>>>> we plan on adding more 'Plot' related modules then I would say go for >>>>> it, >>>>> but Plot.plotting seems unnecessarily redundant given that plotting.py >>>>> is >>>>> the only module in Plot. >>>>> -- >>>>> >>>>> I will say that I'm not completely sold on misc.dataset for defining the >>>>> "Dataset" class. Other than that I prefer the structuring we came up >>>>> with >>>>> Monday over the new one. I don't feel that the new structuring helps >>>>> with >>>>> the Dataset problem enough to warrant the changes it makes elsewhere. >>>>> >>>>> >>>>> -- Joyce >>>>> >>>>> >>>>> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Monday this week Cam, Mike and me had a 30 min talk about the >>>>> refactoring >>>>>> RCMES code and coming with a code structure. >>>>>> On the wiki page [1] there are two code structures , structure A and >>>>> B. >>>>>> Structure (B) was the one we came up with on Mondays talk. >>>>>> Since then I was trying to make some improvements on that and came up >>>>> with >>>>>> structure (A). >>>>>> Most of the improvements were on trying to make naming more easy >>>>>> understudying for user and a simpler structure. >>>>>> For example: >>>>>> (B) >>>>> (A) >>>>>> misc.Dataset = Data.content >>>>>> DataSource.local = Data.retrieve.local >>>>>> DatasetProcessor = Data.process >>>>>> >>>>>> Here are some "import" examples we can have with the new structure: >>>>>> import Data.content >>>>>> import Datat.process >>>>>> import Data.retrieve.local >>>>>> import Data.retrieve.rcmed >>>>>> >>>>>> The Review Board for these python codes will come up soon. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Best, >>>>>> Mazi >>>>>> >>>>>> [1]: >>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc >>>>> h+API+summary >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Regarding to the RCMES refactoring API codes (Toolkit), I thought is >>>>>> good to have a wiki page that summarize the API we are going to have >>>>> for >>>>>> RCMES in future. >>>>>>> This is not the actual document we will have later for RCMES code, >>>>> but >>>>>> it just the list of classes, modules, methods and functions we may >>>>> need >>>>> to >>>>>> develop. >>>>>>> It would be great if you guys help me to complete this wiki before >>>>> we >>>>>> start the refactoring toolkit's codes. >>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc >>>>> h+API+summary >>>>>>> >>>>>>> Best, >>>>>>> Mazi >>>>>>> >>>>>>> >>>>>>> On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote: >>>>>>> >>>>>>>> +1 for cutting 0.1-incubating and starting these changes in 0.2. >>>>>>>> >>>>>>>> >>>>>>>> -- Joyce >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> This sounds like a good path to proceed down to me. >>>>>>>>> >>>>>>>>> I would formulate the below into a set of JIRA issues, >>>>>>>>> then proceed by incrementally evolving the toolkit to >>>>>>>>> support this. >>>>>>>>> >>>>>>>>> The only catch is that many of these could potentially >>>>>>>>> be API back compat. Since we haven't really talked or >>>>>>>>> suggested about the impact of this; nee made a release, >>>>>>>>> it's certainly possible to do this in trunk. >>>>>>>>> >>>>>>>>> My suggestion though since trunk represents what we >>>>>>>>> all believe to be RCMET 2.0 API compat, we should probably >>>>>>>>> create a branch for this. Or, better yet: >>>>>>>>> >>>>>>>>> 1. Close out current JIRA issues for 0.1-incubating. >>>>>>>>> 2. Cut a 0.1-incubating RC/release process. >>>>>>>>> 3. Start to implement the below in 0.2-incubating. >>>>>>>>> >>>>>>>>> Thoughts? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> Chris Mattmann, Ph.D. >>>>>>>>> Senior Computer Scientist >>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>>>> Office: 171-266B, Mailstop: 171-246 >>>>>>>>> Email: [email protected] >>>>>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> Adjunct Assistant Professor, Computer Science Department >>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Michael Joyce <[email protected]> >>>>>>>>> Reply-To: "[email protected]" >>>>>>>>> <[email protected]> >>>>>>>>> Date: Wednesday, June 5, 2013 6:56 AM >>>>>>>>> To: dev <[email protected]> >>>>>>>>> Subject: Proposed Toolkit Refactoring >>>>>>>>> >>>>>>>>>> All, >>>>>>>>>> >>>>>>>>>> This is a brief rundown of a discussion that Paul, Cam, Mazi, >>>>> and I >>>>>> had >>>>>>>>>> yesterday regarding the current state of the toolkit and proposed >>>>>> changes >>>>>>>>>> that we would like to discuss with the list. >>>>>>>>>> >>>>>>>>>> We discussed adding a number of objects that should help simplify >>>>>> toolkit >>>>>>>>>> usage. Below is a high-level rundown of our discussion. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Dataset: Simple container object for a dataset. Provides helpers >>>>> for >>>>>>>>>> accessing relevant data (getLatsLons, getTime) and convenience >>>>>> functions >>>>>>>>>> (writeToFile()). >>>>>>>>>> >>>>>>>>>> DataSource: Provides the user with helper functions for grabbing >>>>> the >>>>>> data >>>>>>>>>> that they want to evaluate. There's a RCMED module specifically >>>>> for >>>>>>>>>> grabbing RCMED data and a Local module for grabbing local data. >>>>> This >>>>>> could >>>>>>>>>> easily be expanded to include ESG and other data sources. >>>>>>>>>> >>>>>>>>>> DatasetProcessor: Any operation that needs to be run on datasets >>>>> (that >>>>>>>>>> isn't the evaluation obviously) is found in the >>>>> DatasetProcessor. It >>>>>>>>>> supports: >>>>>>>>>> - regridding (spatial and temporal) >>>>>>>>>> - masking/cleaning/filtering >>>>>>>>>> - subsetting (spatial and temporal) >>>>>>>>>> - ensemble generation >>>>>>>>>> - anything else that fits here. >>>>>>>>>> >>>>>>>>>> Evaluation: The Evaluation object is (surprise surprise) in >>>>> charge >>>>> of >>>>>>>>>> running Evaluations. It keeps track of the datasets (both >>>>> 'reference' >>>>>> and >>>>>>>>>> the 'targets') that the user wants to use in the evaluation. It >>>>> runs >>>>>> all >>>>>>>>>> the necessary evaluations and keeps the results nicely stored and >>>>>> readily >>>>>>>>>> accessible for the user. >>>>>>>>>> >>>>>>>>>> Metric: Metrics are added to an Evaluation and used during the >>>>> run. >>>>>> All >>>>>>>>>> metrics inherit from the base Metric class. All you need to add >>>>> new >>>>>>>>>> metrics >>>>>>>>>> is inherit from Metric and override the 'run' method. >>>>>>>>>> >>>>>>>>>> Plotter: The Plotter makes result visualization a breeze. If you >>>>> give >>>>>> it >>>>>>>>>> an >>>>>>>>>> Evaluation object it will spit >>>>>>>>>> out plots of all the results. Give it a Dataset and it will spit >>>>> out a >>>>>>>>>> plot. You can even have it return >>>>>>>>>> Matplotlib objects so you can make your results look exactly the >>>>> way >>>>>> you'd >>>>>>>>>> like. >>>>>>>>>> >>>>>>>>>> -- Joyce >>> >>> > > --Paul
