On Jun 14, 2013, at 7:30 AM, "Kim, Jinwon" <[email protected]> wrote:
> how about 'visualize.py' in the place of 'display.py'? Visualize may be more > common than display in climate community (e.g., visualization is one of the > main theme in computing projects related with climate research). +1 > > To me, the current module metrics.py may need split as it is getting very > long (>1400 lines in my version that was substantially cleaned from the > version 2.0). > > Agreed. I'll take a stab at this. > > > ----------------------------------------------------------------------------------------------------- > Jinwon Kim > Dept. Atmospheric and Oceanic Sciences and > Joint Institute for Regional Earth System Science and Engineering > University of California, Los Angeles > Los Angeles, CA 90095-1565 > ________________________________________ > From: Ramirez, Paul M (398J) [[email protected]] > Sent: Thursday, June 13, 2013 7:40 AM > To: [email protected] > Subject: Re: Proposed Toolkit Refactoring > > All, > > What about instead of Plotter we collocate the plots into a Display class > or module? > > Agree with Mike on the extra fluff around packages not adding and > therefore should be dropped. > > > rcmed = new RCMED(database_info) > obs = rcmed.loadObservation(key) > model = local.loadModel(filepath) > metric = [metrics.bias, metrics.pdf] > evaluation = new Evaluation(obs, model, metric) > results = evaluation.run() > > Maybe I'm simplifying this too much but wouldn't the following suffice? > > ocw > ├── __init__.py > ├── dataset.py > ├── display.py > ├── evaluate.py > ├── io > │ ├── __init.py__ > │ ├── esg.py > │ ├── local.py > │ └── rcmed.py > └── metrics.py > > > > --Paul > > > > On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote: > >> >> "Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good >> sense?" >> --> these can be taken as 'standard terminology' except 'plotter'. >> >> -------------------------------------------------------------------------- >> --------------------------- >> Jinwon Kim >> Dept. Atmospheric and Oceanic Sciences and >> Joint Institute for Regional Earth System Science and Engineering >> University of California, Los Angeles >> Los Angeles, CA 90095-1565 >> ________________________________________ >> From: [email protected] [[email protected]] on behalf of Cameron >> Goodale [[email protected]] >> Sent: Wednesday, June 12, 2013 10:10 PM >> To: [email protected] >> Subject: Re: Proposed Toolkit Refactoring >> >> I have to agree with Mike on this one, but reserve the right to change my >> mind later ;) >> >> It is a delicate balance between organizing code that is maintainable, >> decoupled, and still retains an API that is easy for humans to read and >> understand. I will always favor direct and concise names over fuzzy or >> ambiguous ones. Naming is hard, period. >> >> I don't like the misc.Dataset.py that feels clunky and your don't get much >> more fuzzy than 'misc', so we should get that cleaned up. >> >> Thank you Mazi for creating the wiki page so we can all visually see the >> code structure, this is a big help to the project. >> >> I would like to invite any of the science users and/or devs to weigh in on >> this. If the resulting API doesn't make sense to the end users, then we >> have failed (in my opinion). >> >> Question for NON-Computer Scientists: >> Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good >> sense? >> >> >> >> >> -Cameron >> >> >> On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote: >> >>> I find the new structuring (A) to be a bit confusing. I think the part I >>> find confusing is the 'data' part. Combining Dataset, DatasetProcessor, >>> and >>> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if >>> I >>> can give some examples. >>> >>> ocw.data.content is where I assume that Dataset is defined? The naming >>> doesn't really imply this to me. "content" is a bit ambiguous. >>> >>> ocw.data.processing is fine but personally I find ocw.dataset_processor >>> to >>> be more clear. Makes it seem like you have this object >>> (DatasetProcessor) >>> that let's you "process datasets". The first one says to me "O, I can >>> process data...what does that mean?". >>> >>> In my opinion, data_source.rcmed.getDataset() is more understandable >>> than >>> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a >>> datasource from which you can get a dataset. I think the second one does >>> this as well but not as clearly as the first. This could certainly be >>> fixed >>> by changing some of the naming (Maybe data.sources.rcmed.getDataset()) >>> but >>> then why bother with the extra level of nesting if it's not doing >>> much/anything? >>> >>> Lastly, why have Plot.plotting.py? That extra directory >>> doesn't accomplish anything outside of adding another level of nesting. >>> If >>> we plan on adding more 'Plot' related modules then I would say go for >>> it, >>> but Plot.plotting seems unnecessarily redundant given that plotting.py >>> is >>> the only module in Plot. >>> -- >>> >>> I will say that I'm not completely sold on misc.dataset for defining the >>> "Dataset" class. Other than that I prefer the structuring we came up >>> with >>> Monday over the new one. I don't feel that the new structuring helps >>> with >>> the Dataset problem enough to warrant the changes it makes elsewhere. >>> >>> >>> -- Joyce >>> >>> >>> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> Monday this week Cam, Mike and me had a 30 min talk about the >>> refactoring >>>> RCMES code and coming with a code structure. >>>> On the wiki page [1] there are two code structures , structure A and >>> B. >>>> Structure (B) was the one we came up with on Mondays talk. >>>> Since then I was trying to make some improvements on that and came up >>> with >>>> structure (A). >>>> Most of the improvements were on trying to make naming more easy >>>> understudying for user and a simpler structure. >>>> For example: >>>> (B) >>> (A) >>>> misc.Dataset = Data.content >>>> DataSource.local = Data.retrieve.local >>>> DatasetProcessor = Data.process >>>> >>>> Here are some "import" examples we can have with the new structure: >>>> import Data.content >>>> import Datat.process >>>> import Data.retrieve.local >>>> import Data.retrieve.rcmed >>>> >>>> The Review Board for these python codes will come up soon. >>>> >>>> Thoughts? >>>> >>>> Best, >>>> Mazi >>>> >>>> [1]: >>> >>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc >>> h+API+summary >>>> >>>> >>>> >>>> >>>> On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote: >>>> >>>>> Hi All, >>>>> >>>>> Regarding to the RCMES refactoring API codes (Toolkit), I thought is >>>> good to have a wiki page that summarize the API we are going to have >>> for >>>> RCMES in future. >>>>> This is not the actual document we will have later for RCMES code, >>> but >>>> it just the list of classes, modules, methods and functions we may >>> need >>> to >>>> develop. >>>>> It would be great if you guys help me to complete this wiki before >>> we >>>> start the refactoring toolkit's codes. >>> >>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc >>> h+API+summary >>>>> >>>>> Best, >>>>> Mazi >>>>> >>>>> >>>>> On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote: >>>>> >>>>>> +1 for cutting 0.1-incubating and starting these changes in 0.2. >>>>>> >>>>>> >>>>>> -- Joyce >>>>>> >>>>>> >>>>>> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> This sounds like a good path to proceed down to me. >>>>>>> >>>>>>> I would formulate the below into a set of JIRA issues, >>>>>>> then proceed by incrementally evolving the toolkit to >>>>>>> support this. >>>>>>> >>>>>>> The only catch is that many of these could potentially >>>>>>> be API back compat. Since we haven't really talked or >>>>>>> suggested about the impact of this; nee made a release, >>>>>>> it's certainly possible to do this in trunk. >>>>>>> >>>>>>> My suggestion though since trunk represents what we >>>>>>> all believe to be RCMET 2.0 API compat, we should probably >>>>>>> create a branch for this. Or, better yet: >>>>>>> >>>>>>> 1. Close out current JIRA issues for 0.1-incubating. >>>>>>> 2. Cut a 0.1-incubating RC/release process. >>>>>>> 3. Start to implement the below in 0.2-incubating. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>>>>> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Chris Mattmann, Ph.D. >>>>>>> Senior Computer Scientist >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>> Office: 171-266B, Mailstop: 171-246 >>>>>>> Email: [email protected] >>>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Adjunct Assistant Professor, Computer Science Department >>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Michael Joyce <[email protected]> >>>>>>> Reply-To: "[email protected]" >>>>>>> <[email protected]> >>>>>>> Date: Wednesday, June 5, 2013 6:56 AM >>>>>>> To: dev <[email protected]> >>>>>>> Subject: Proposed Toolkit Refactoring >>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> This is a brief rundown of a discussion that Paul, Cam, Mazi, >>> and I >>>> had >>>>>>>> yesterday regarding the current state of the toolkit and proposed >>>> changes >>>>>>>> that we would like to discuss with the list. >>>>>>>> >>>>>>>> We discussed adding a number of objects that should help simplify >>>> toolkit >>>>>>>> usage. Below is a high-level rundown of our discussion. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Dataset: Simple container object for a dataset. Provides helpers >>> for >>>>>>>> accessing relevant data (getLatsLons, getTime) and convenience >>>> functions >>>>>>>> (writeToFile()). >>>>>>>> >>>>>>>> DataSource: Provides the user with helper functions for grabbing >>> the >>>> data >>>>>>>> that they want to evaluate. There's a RCMED module specifically >>> for >>>>>>>> grabbing RCMED data and a Local module for grabbing local data. >>> This >>>> could >>>>>>>> easily be expanded to include ESG and other data sources. >>>>>>>> >>>>>>>> DatasetProcessor: Any operation that needs to be run on datasets >>> (that >>>>>>>> isn't the evaluation obviously) is found in the >>> DatasetProcessor. It >>>>>>>> supports: >>>>>>>> - regridding (spatial and temporal) >>>>>>>> - masking/cleaning/filtering >>>>>>>> - subsetting (spatial and temporal) >>>>>>>> - ensemble generation >>>>>>>> - anything else that fits here. >>>>>>>> >>>>>>>> Evaluation: The Evaluation object is (surprise surprise) in >>> charge >>> of >>>>>>>> running Evaluations. It keeps track of the datasets (both >>> 'reference' >>>> and >>>>>>>> the 'targets') that the user wants to use in the evaluation. It >>> runs >>>> all >>>>>>>> the necessary evaluations and keeps the results nicely stored and >>>> readily >>>>>>>> accessible for the user. >>>>>>>> >>>>>>>> Metric: Metrics are added to an Evaluation and used during the >>> run. >>>> All >>>>>>>> metrics inherit from the base Metric class. All you need to add >>> new >>>>>>>> metrics >>>>>>>> is inherit from Metric and override the 'run' method. >>>>>>>> >>>>>>>> Plotter: The Plotter makes result visualization a breeze. If you >>> give >>>> it >>>>>>>> an >>>>>>>> Evaluation object it will spit >>>>>>>> out plots of all the results. Give it a Dataset and it will spit >>> out a >>>>>>>> plot. You can even have it return >>>>>>>> Matplotlib objects so you can make your results look exactly the >>> way >>>> you'd >>>>>>>> like. >>>>>>>> >>>>>>>> -- Joyce >
