Re: Proposed Toolkit Refactoring

Boustani, Maziyar (398F) Fri, 14 Jun 2013 11:01:05 -0700

Hi All,

1- Thanks for helping on naming and structure of OCW.
2- I always would like to generate a flow-chart type of diagram for better and 
easy understanding of API structure, please check [1] for updated version.
3- I will start coding one both "dataset.py" and "local.py" and will send the 
review board soon.



[1]: 
https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbench+API+summary

Best,
Mazi


On Jun 14, 2013, at 9:00 AM, Ramirez, Paul M (398J) wrote:

> 
> 
> Sent from my iPhone
> 
> On Jun 14, 2013, at 5:29 AM, "Michael Joyce" <[email protected]> wrote:
> 
>> Paul,
>> 
>> io vs data_source - I would stick with data_source. Much clearer in my
>> opinion.
>> dataset_processor - still needs to be in there
>> 
>> Regarding Evaluation and Metrics. They're interrelated and should be in the
>> same package in my opinion. I also dislike storing a ton of metric
>> functions in metrics. I much prefer the class formatting that we discussed
>> at our last meeting (perhaps I'm just misreading this part though).
>> 
>> plotter vs display vs visualization: I like visualization the best
>> personally.
>> 
>> As a side note we need to be careful how we name files and various classes.
>> Otherwise our imports are going to be annoying and redundant. Say we want
>> to get the Metric class from the metric file. We end up with
>> ocw.metric.metric which is just ugly.
> 
> Don't think this is true if you do an import of the class in the __init__.py. 
> 
> Also I know we had discussion of Metrics as a class but we don't config them 
> now and if we did it could be easier to just pass the metric config to the 
> Evaluation. What we need to do is look what config we would even have in the 
> current metrics and balance with how many knobs are we trying to put in a 
> priori. Nobs can be added when requested. 
> 
> 
>> 
>> -- Joyce
>> 
>> 
>> On Thu, Jun 13, 2013 at 7:40 AM, Ramirez, Paul M (398J) <
>> [email protected]> wrote:
>> 
>>> All,
>>> 
>>> What about instead of Plotter we collocate the plots into a Display class
>>> or module?
>>> 
>>> Agree with Mike on the extra fluff around packages not adding and
>>> therefore should be dropped.
>>> 
>>> 
>>> rcmed = new RCMED(database_info)
>>> obs = rcmed.loadObservation(key)
>>> model = local.loadModel(filepath)
>>> metric = [metrics.bias, metrics.pdf]
>>> evaluation = new Evaluation(obs, model, metric)
>>> results = evaluation.run()
>>> 
>>> Maybe I'm simplifying this too much but wouldn't the following suffice?
>>> 
>>> ocw
>>> ├── __init__.py
>>> ├── dataset.py
>>> ├── display.py
>>> ├── evaluate.py
>>> ├── io
>>> │   ├── __init.py__
>>> │   ├── esg.py
>>> │   ├── local.py
>>> │   └── rcmed.py
>>> └── metrics.py
>>> 
>>> 
>>> 
>>> --Paul
>>> 
>>> 
>>> 
>>> On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote:
>>> 
>>>> 
>>>> "Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>>>> sense?"
>>>> --> these can be taken as 'standard terminology' except 'plotter'.
>>>> 
>>>> --------------------------------------------------------------------------
>>>> ---------------------------
>>>> Jinwon Kim
>>>> Dept. Atmospheric and Oceanic Sciences and
>>>> Joint Institute for Regional Earth System Science and Engineering
>>>> University of California, Los Angeles
>>>> Los Angeles, CA 90095-1565
>>>> ________________________________________
>>>> From: [email protected] [[email protected]] on behalf of Cameron
>>>> Goodale [[email protected]]
>>>> Sent: Wednesday, June 12, 2013 10:10 PM
>>>> To: [email protected]
>>>> Subject: Re: Proposed Toolkit Refactoring
>>>> 
>>>> I have to agree with Mike on this one, but reserve the right to change my
>>>> mind later ;)
>>>> 
>>>> It is a delicate balance between organizing code that is maintainable,
>>>> decoupled, and still retains an API that is easy for humans to read and
>>>> understand.  I will always favor direct and concise names over fuzzy or
>>>> ambiguous ones.  Naming is hard, period.
>>>> 
>>>> I don't like the misc.Dataset.py that feels clunky and your don't get
>>> much
>>>> more fuzzy than 'misc', so we should get that cleaned up.
>>>> 
>>>> Thank you Mazi for creating the wiki page so we can all visually see the
>>>> code structure, this is a big help to the project.
>>>> 
>>>> I would like to invite any of the science users and/or devs to weigh in on
>>>> this.  If the resulting API doesn't make sense to the end users, then we
>>>> have failed (in my opinion).
>>>> 
>>>> Question for NON-Computer Scientists:
>>>> Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>>>> sense?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -Cameron
>>>> 
>>>> 
>>>> On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote:
>>>> 
>>>>> I find the new structuring (A) to be a bit confusing. I think the part I
>>>>> find confusing is the 'data' part. Combining Dataset, DatasetProcessor,
>>>>> and
>>>>> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if
>>>>> I
>>>>> can give some examples.
>>>>> 
>>>>> ocw.data.content is where I assume that Dataset is defined? The naming
>>>>> doesn't really imply this to me. "content" is a bit ambiguous.
>>>>> 
>>>>> ocw.data.processing is fine but personally I find ocw.dataset_processor
>>>>> to
>>>>> be more clear. Makes it seem like you have this object
>>>>> (DatasetProcessor)
>>>>> that let's you "process datasets". The first one says to me "O, I can
>>>>> process data...what does that mean?".
>>>>> 
>>>>> In my opinion, data_source.rcmed.getDataset() is more understandable
>>>>> than
>>>>> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a
>>>>> datasource from which you can get a dataset. I think the second one does
>>>>> this as well but not as clearly as the first. This could certainly be
>>>>> fixed
>>>>> by changing some of the naming (Maybe data.sources.rcmed.getDataset())
>>>>> but
>>>>> then why bother with the extra level of nesting if it's not doing
>>>>> much/anything?
>>>>> 
>>>>> Lastly, why have Plot.plotting.py? That extra directory
>>>>> doesn't accomplish anything outside of adding another level of nesting.
>>>>> If
>>>>> we plan on adding more 'Plot' related modules then I would say go for
>>>>> it,
>>>>> but Plot.plotting seems unnecessarily redundant given that plotting.py
>>>>> is
>>>>> the only module in Plot.
>>>>> --
>>>>> 
>>>>> I will say that I'm not completely sold on misc.dataset for defining the
>>>>> "Dataset" class. Other than that I prefer the structuring we came up
>>>>> with
>>>>> Monday over the new one. I don't feel that the new structuring helps
>>>>> with
>>>>> the Dataset problem enough to warrant the changes it makes elsewhere.
>>>>> 
>>>>> 
>>>>> -- Joyce
>>>>> 
>>>>> 
>>>>> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Monday this week Cam, Mike and me had a 30 min talk about the
>>>>> refactoring
>>>>>> RCMES code and coming with a code structure.
>>>>>> On the wiki page [1] there are two code structures , structure A and
>>>>> B.
>>>>>> Structure (B) was the one we came up with on Mondays talk.
>>>>>> Since then I was trying to make some improvements on that and came up
>>>>> with
>>>>>> structure (A).
>>>>>> Most of the improvements were on trying to make naming more easy
>>>>>> understudying for user and a simpler structure.
>>>>>> For example:
>>>>>>                               (B)
>>>>> (A)
>>>>>>                       misc.Dataset              =     Data.content
>>>>>>                       DataSource.local   =    Data.retrieve.local
>>>>>>                       DatasetProcessor  =     Data.process
>>>>>> 
>>>>>> Here are some "import" examples we can have with the new structure:
>>>>>>       import Data.content
>>>>>>       import Datat.process
>>>>>>       import Data.retrieve.local
>>>>>>       import Data.retrieve.rcmed
>>>>>> 
>>>>>> The Review Board for these python codes will come up soon.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> Best,
>>>>>> Mazi
>>>>>> 
>>>>>> [1]:
>>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>>>> h+API+summary
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote:
>>>>>> 
>>>>>>> Hi All,
>>>>>>> 
>>>>>>> Regarding to the RCMES refactoring API codes (Toolkit), I thought is
>>>>>> good to have a wiki page that summarize the API we are going to have
>>>>> for
>>>>>> RCMES in future.
>>>>>>> This is not the actual document we will have later for RCMES code,
>>>>> but
>>>>>> it just the list of classes, modules, methods and functions we may
>>>>> need
>>>>> to
>>>>>> develop.
>>>>>>> It would be great if you guys help me to complete this wiki before
>>>>> we
>>>>>> start the refactoring toolkit's codes.
>>> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>>>> h+API+summary
>>>>>>> 
>>>>>>> Best,
>>>>>>> Mazi
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote:
>>>>>>> 
>>>>>>>> +1 for cutting 0.1-incubating and starting these changes in 0.2.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- Joyce
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> This sounds like a good path to proceed down to me.
>>>>>>>>> 
>>>>>>>>> I would formulate the below into a set of JIRA issues,
>>>>>>>>> then proceed by incrementally evolving the toolkit to
>>>>>>>>> support this.
>>>>>>>>> 
>>>>>>>>> The only catch is that many of these could potentially
>>>>>>>>> be API back compat. Since we haven't really talked or
>>>>>>>>> suggested about the impact of this; nee made a release,
>>>>>>>>> it's certainly possible to do this in trunk.
>>>>>>>>> 
>>>>>>>>> My suggestion though since trunk represents what we
>>>>>>>>> all believe to be RCMET 2.0 API compat, we should probably
>>>>>>>>> create a branch for this. Or, better yet:
>>>>>>>>> 
>>>>>>>>> 1. Close out current JIRA issues for 0.1-incubating.
>>>>>>>>> 2. Cut a 0.1-incubating RC/release process.
>>>>>>>>> 3. Start to implement the below in 0.2-incubating.
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Chris
>>>>>>>>> 
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Chris Mattmann, Ph.D.
>>>>>>>>> Senior Computer Scientist
>>>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>>>> Office: 171-266B, Mailstop: 171-246
>>>>>>>>> Email: [email protected]
>>>>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> Adjunct Assistant Professor, Computer Science Department
>>>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Michael Joyce <[email protected]>
>>>>>>>>> Reply-To: "[email protected]"
>>>>>>>>> <[email protected]>
>>>>>>>>> Date: Wednesday, June 5, 2013 6:56 AM
>>>>>>>>> To: dev <[email protected]>
>>>>>>>>> Subject: Proposed Toolkit Refactoring
>>>>>>>>> 
>>>>>>>>>> All,
>>>>>>>>>> 
>>>>>>>>>> This is a brief rundown of a discussion that Paul, Cam, Mazi,
>>>>> and I
>>>>>> had
>>>>>>>>>> yesterday regarding the current state of the toolkit and proposed
>>>>>> changes
>>>>>>>>>> that we would like to discuss with the list.
>>>>>>>>>> 
>>>>>>>>>> We discussed adding a number of objects that should help simplify
>>>>>> toolkit
>>>>>>>>>> usage. Below is a high-level rundown of our discussion.
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> 
>>>>>>>>>> Dataset: Simple container object for a dataset. Provides helpers
>>>>> for
>>>>>>>>>> accessing relevant data (getLatsLons, getTime) and convenience
>>>>>> functions
>>>>>>>>>> (writeToFile()).
>>>>>>>>>> 
>>>>>>>>>> DataSource: Provides the user with helper functions for grabbing
>>>>> the
>>>>>> data
>>>>>>>>>> that they want to evaluate. There's a RCMED module specifically
>>>>> for
>>>>>>>>>> grabbing RCMED data and a Local module for grabbing local data.
>>>>> This
>>>>>> could
>>>>>>>>>> easily be expanded to include ESG and other data sources.
>>>>>>>>>> 
>>>>>>>>>> DatasetProcessor: Any operation that needs to be run on datasets
>>>>> (that
>>>>>>>>>> isn't the evaluation obviously) is found in the
>>>>> DatasetProcessor. It
>>>>>>>>>> supports:
>>>>>>>>>> - regridding (spatial and temporal)
>>>>>>>>>> - masking/cleaning/filtering
>>>>>>>>>> - subsetting (spatial and temporal)
>>>>>>>>>> - ensemble generation
>>>>>>>>>> - anything else that fits here.
>>>>>>>>>> 
>>>>>>>>>> Evaluation: The Evaluation object is (surprise surprise) in
>>>>> charge
>>>>> of
>>>>>>>>>> running Evaluations. It keeps track of the datasets (both
>>>>> 'reference'
>>>>>> and
>>>>>>>>>> the 'targets') that the user wants to use in the evaluation. It
>>>>> runs
>>>>>> all
>>>>>>>>>> the necessary evaluations and keeps the results nicely stored and
>>>>>> readily
>>>>>>>>>> accessible for the user.
>>>>>>>>>> 
>>>>>>>>>> Metric: Metrics are added to an Evaluation and used during the
>>>>> run.
>>>>>> All
>>>>>>>>>> metrics inherit from the base Metric class. All you need to add
>>>>> new
>>>>>>>>>> metrics
>>>>>>>>>> is inherit from Metric and override the 'run' method.
>>>>>>>>>> 
>>>>>>>>>> Plotter: The Plotter makes result visualization a breeze. If you
>>>>> give
>>>>>> it
>>>>>>>>>> an
>>>>>>>>>> Evaluation object it will spit
>>>>>>>>>> out plots of all the results. Give it a Dataset and it will spit
>>>>> out a
>>>>>>>>>> plot. You can even have it return
>>>>>>>>>> Matplotlib objects so you can make your results look exactly the
>>>>> way
>>>>>> you'd
>>>>>>>>>> like.
>>>>>>>>>> 
>>>>>>>>>> -- Joyce
>>> 
>>> 
> 
> --Paul

Re: Proposed Toolkit Refactoring

Reply via email to