Re: Proposed Toolkit Refactoring

Ramirez, Paul M (398J) Fri, 14 Jun 2013 08:53:44 -0700


On Jun 14, 2013, at 7:30 AM, "Kim, Jinwon" <[email protected]> wrote:


> how about 'visualize.py' in the place of 'display.py'?  Visualize may be more 
> common than display in climate community (e.g., visualization is one of the 
> main theme in computing projects related with climate research).

+1

> 
> To me, the current module metrics.py may need split as it is getting very 
> long (>1400 lines in my version that was substantially cleaned from the 
> version 2.0).
> 
> 

Agreed. I'll take a stab at this. 



> 
> 
> -----------------------------------------------------------------------------------------------------
> Jinwon Kim
> Dept. Atmospheric and Oceanic Sciences and
> Joint Institute for Regional Earth System Science and Engineering
> University of California, Los Angeles
> Los Angeles, CA 90095-1565
> ________________________________________
> From: Ramirez, Paul M (398J) [[email protected]]
> Sent: Thursday, June 13, 2013 7:40 AM
> To: [email protected]
> Subject: Re: Proposed Toolkit Refactoring
> 
> All,
> 
> What about instead of Plotter we collocate the plots into a Display class
> or module?
> 
> Agree with Mike on the extra fluff around packages not adding and
> therefore should be dropped.
> 
> 
> rcmed = new RCMED(database_info)
> obs = rcmed.loadObservation(key)
> model = local.loadModel(filepath)
> metric = [metrics.bias, metrics.pdf]
> evaluation = new Evaluation(obs, model, metric)
> results = evaluation.run()
> 
> Maybe I'm simplifying this too much but wouldn't the following suffice?
> 
> ocw
> ├── __init__.py
> ├── dataset.py
> ├── display.py
> ├── evaluate.py
> ├── io
> │   ├── __init.py__
> │   ├── esg.py
> │   ├── local.py
> │   └── rcmed.py
> └── metrics.py
> 
> 
> 
> --Paul
> 
> 
> 
> On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote:
> 
>> 
>> "Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>> sense?"
>> --> these can be taken as 'standard terminology' except 'plotter'.
>> 
>> --------------------------------------------------------------------------
>> ---------------------------
>> Jinwon Kim
>> Dept. Atmospheric and Oceanic Sciences and
>> Joint Institute for Regional Earth System Science and Engineering
>> University of California, Los Angeles
>> Los Angeles, CA 90095-1565
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of Cameron
>> Goodale [[email protected]]
>> Sent: Wednesday, June 12, 2013 10:10 PM
>> To: [email protected]
>> Subject: Re: Proposed Toolkit Refactoring
>> 
>> I have to agree with Mike on this one, but reserve the right to change my
>> mind later ;)
>> 
>> It is a delicate balance between organizing code that is maintainable,
>> decoupled, and still retains an API that is easy for humans to read and
>> understand.  I will always favor direct and concise names over fuzzy or
>> ambiguous ones.  Naming is hard, period.
>> 
>> I don't like the misc.Dataset.py that feels clunky and your don't get much
>> more fuzzy than 'misc', so we should get that cleaned up.
>> 
>> Thank you Mazi for creating the wiki page so we can all visually see the
>> code structure, this is a big help to the project.
>> 
>> I would like to invite any of the science users and/or devs to weigh in on
>> this.  If the resulting API doesn't make sense to the end users, then we
>> have failed (in my opinion).
>> 
>> Question for NON-Computer Scientists:
>> Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>> sense?
>> 
>> 
>> 
>> 
>> -Cameron
>> 
>> 
>> On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote:
>> 
>>> I find the new structuring (A) to be a bit confusing. I think the part I
>>> find confusing is the 'data' part. Combining Dataset, DatasetProcessor,
>>> and
>>> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if
>>> I
>>> can give some examples.
>>> 
>>> ocw.data.content is where I assume that Dataset is defined? The naming
>>> doesn't really imply this to me. "content" is a bit ambiguous.
>>> 
>>> ocw.data.processing is fine but personally I find ocw.dataset_processor
>>> to
>>> be more clear. Makes it seem like you have this object
>>> (DatasetProcessor)
>>> that let's you "process datasets". The first one says to me "O, I can
>>> process data...what does that mean?".
>>> 
>>> In my opinion, data_source.rcmed.getDataset() is more understandable
>>> than
>>> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a
>>> datasource from which you can get a dataset. I think the second one does
>>> this as well but not as clearly as the first. This could certainly be
>>> fixed
>>> by changing some of the naming (Maybe data.sources.rcmed.getDataset())
>>> but
>>> then why bother with the extra level of nesting if it's not doing
>>> much/anything?
>>> 
>>> Lastly, why have Plot.plotting.py? That extra directory
>>> doesn't accomplish anything outside of adding another level of nesting.
>>> If
>>> we plan on adding more 'Plot' related modules then I would say go for
>>> it,
>>> but Plot.plotting seems unnecessarily redundant given that plotting.py
>>> is
>>> the only module in Plot.
>>> --
>>> 
>>> I will say that I'm not completely sold on misc.dataset for defining the
>>> "Dataset" class. Other than that I prefer the structuring we came up
>>> with
>>> Monday over the new one. I don't feel that the new structuring helps
>>> with
>>> the Dataset problem enough to warrant the changes it makes elsewhere.
>>> 
>>> 
>>> -- Joyce
>>> 
>>> 
>>> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) <
>>> [email protected]> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> Monday this week Cam, Mike and me had a 30 min talk about the
>>> refactoring
>>>> RCMES code and coming with a code structure.
>>>> On the wiki page [1] there are two code structures , structure A and
>>> B.
>>>> Structure (B) was the one we came up with on Mondays talk.
>>>> Since then I was trying to make some improvements on that and came up
>>> with
>>>> structure (A).
>>>> Most of the improvements were on trying to make naming more easy
>>>> understudying for user and a simpler structure.
>>>> For example:
>>>>                                (B)
>>> (A)
>>>>                        misc.Dataset              =     Data.content
>>>>                        DataSource.local   =    Data.retrieve.local
>>>>                        DatasetProcessor  =     Data.process
>>>> 
>>>> Here are some "import" examples we can have with the new structure:
>>>>        import Data.content
>>>>        import Datat.process
>>>>        import Data.retrieve.local
>>>>        import Data.retrieve.rcmed
>>>> 
>>>> The Review Board for these python codes will come up soon.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Best,
>>>> Mazi
>>>> 
>>>> [1]:
>>> 
>>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>> h+API+summary
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote:
>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Regarding to the RCMES refactoring API codes (Toolkit), I thought is
>>>> good to have a wiki page that summarize the API we are going to have
>>> for
>>>> RCMES in future.
>>>>> This is not the actual document we will have later for RCMES code,
>>> but
>>>> it just the list of classes, modules, methods and functions we may
>>> need
>>> to
>>>> develop.
>>>>> It would be great if you guys help me to complete this wiki before
>>> we
>>>> start the refactoring toolkit's codes.
>>> 
>>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>> h+API+summary
>>>>> 
>>>>> Best,
>>>>> Mazi
>>>>> 
>>>>> 
>>>>> On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote:
>>>>> 
>>>>>> +1 for cutting 0.1-incubating and starting these changes in 0.2.
>>>>>> 
>>>>>> 
>>>>>> -- Joyce
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>> This sounds like a good path to proceed down to me.
>>>>>>> 
>>>>>>> I would formulate the below into a set of JIRA issues,
>>>>>>> then proceed by incrementally evolving the toolkit to
>>>>>>> support this.
>>>>>>> 
>>>>>>> The only catch is that many of these could potentially
>>>>>>> be API back compat. Since we haven't really talked or
>>>>>>> suggested about the impact of this; nee made a release,
>>>>>>> it's certainly possible to do this in trunk.
>>>>>>> 
>>>>>>> My suggestion though since trunk represents what we
>>>>>>> all believe to be RCMET 2.0 API compat, we should probably
>>>>>>> create a branch for this. Or, better yet:
>>>>>>> 
>>>>>>> 1. Close out current JIRA issues for 0.1-incubating.
>>>>>>> 2. Cut a 0.1-incubating RC/release process.
>>>>>>> 3. Start to implement the below in 0.2-incubating.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>> 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Senior Computer Scientist
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>> Email: [email protected]
>>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Michael Joyce <[email protected]>
>>>>>>> Reply-To: "[email protected]"
>>>>>>> <[email protected]>
>>>>>>> Date: Wednesday, June 5, 2013 6:56 AM
>>>>>>> To: dev <[email protected]>
>>>>>>> Subject: Proposed Toolkit Refactoring
>>>>>>> 
>>>>>>>> All,
>>>>>>>> 
>>>>>>>> This is a brief rundown of a discussion that Paul, Cam, Mazi,
>>> and I
>>>> had
>>>>>>>> yesterday regarding the current state of the toolkit and proposed
>>>> changes
>>>>>>>> that we would like to discuss with the list.
>>>>>>>> 
>>>>>>>> We discussed adding a number of objects that should help simplify
>>>> toolkit
>>>>>>>> usage. Below is a high-level rundown of our discussion.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Dataset: Simple container object for a dataset. Provides helpers
>>> for
>>>>>>>> accessing relevant data (getLatsLons, getTime) and convenience
>>>> functions
>>>>>>>> (writeToFile()).
>>>>>>>> 
>>>>>>>> DataSource: Provides the user with helper functions for grabbing
>>> the
>>>> data
>>>>>>>> that they want to evaluate. There's a RCMED module specifically
>>> for
>>>>>>>> grabbing RCMED data and a Local module for grabbing local data.
>>> This
>>>> could
>>>>>>>> easily be expanded to include ESG and other data sources.
>>>>>>>> 
>>>>>>>> DatasetProcessor: Any operation that needs to be run on datasets
>>> (that
>>>>>>>> isn't the evaluation obviously) is found in the
>>> DatasetProcessor. It
>>>>>>>> supports:
>>>>>>>> - regridding (spatial and temporal)
>>>>>>>> - masking/cleaning/filtering
>>>>>>>> - subsetting (spatial and temporal)
>>>>>>>> - ensemble generation
>>>>>>>> - anything else that fits here.
>>>>>>>> 
>>>>>>>> Evaluation: The Evaluation object is (surprise surprise) in
>>> charge
>>> of
>>>>>>>> running Evaluations. It keeps track of the datasets (both
>>> 'reference'
>>>> and
>>>>>>>> the 'targets') that the user wants to use in the evaluation. It
>>> runs
>>>> all
>>>>>>>> the necessary evaluations and keeps the results nicely stored and
>>>> readily
>>>>>>>> accessible for the user.
>>>>>>>> 
>>>>>>>> Metric: Metrics are added to an Evaluation and used during the
>>> run.
>>>> All
>>>>>>>> metrics inherit from the base Metric class. All you need to add
>>> new
>>>>>>>> metrics
>>>>>>>> is inherit from Metric and override the 'run' method.
>>>>>>>> 
>>>>>>>> Plotter: The Plotter makes result visualization a breeze. If you
>>> give
>>>> it
>>>>>>>> an
>>>>>>>> Evaluation object it will spit
>>>>>>>> out plots of all the results. Give it a Dataset and it will spit
>>> out a
>>>>>>>> plot. You can even have it return
>>>>>>>> Matplotlib objects so you can make your results look exactly the
>>> way
>>>> you'd
>>>>>>>> like.
>>>>>>>> 
>>>>>>>> -- Joyce
>

Re: Proposed Toolkit Refactoring

Reply via email to