Re: Proposed Toolkit Refactoring

Ramirez, Paul M (398J) Thu, 13 Jun 2013 07:42:19 -0700

All,

What about instead of Plotter we collocate the plots into a Display class
or module?


Agree with Mike on the extra fluff around packages not adding and
therefore should be dropped.


rcmed = new RCMED(database_info)
obs = rcmed.loadObservation(key)
model = local.loadModel(filepath)
metric = [metrics.bias, metrics.pdf]
evaluation = new Evaluation(obs, model, metric)
results = evaluation.run()

Maybe I'm simplifying this too much but wouldn't the following suffice?

ocw
├── __init__.py
├── dataset.py
├── display.py
├── evaluate.py
├── io
│   ├── __init.py__
│   ├── esg.py
│   ├── local.py
│   └── rcmed.py
└── metrics.py



--Paul



On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote:

>
>"Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>sense?"
>--> these can be taken as 'standard terminology' except 'plotter'.
>
>--------------------------------------------------------------------------
>---------------------------
>Jinwon Kim
>Dept. Atmospheric and Oceanic Sciences and
>Joint Institute for Regional Earth System Science and Engineering
>University of California, Los Angeles
>Los Angeles, CA 90095-1565
>________________________________________
>From: [email protected] [[email protected]] on behalf of Cameron
>Goodale [[email protected]]
>Sent: Wednesday, June 12, 2013 10:10 PM
>To: [email protected]
>Subject: Re: Proposed Toolkit Refactoring
>
>I have to agree with Mike on this one, but reserve the right to change my
>mind later ;)
>
>It is a delicate balance between organizing code that is maintainable,
>decoupled, and still retains an API that is easy for humans to read and
>understand.  I will always favor direct and concise names over fuzzy or
>ambiguous ones.  Naming is hard, period.
>
>I don't like the misc.Dataset.py that feels clunky and your don't get much
>more fuzzy than 'misc', so we should get that cleaned up.
>
>Thank you Mazi for creating the wiki page so we can all visually see the
>code structure, this is a big help to the project.
>
>I would like to invite any of the science users and/or devs to weigh in on
>this.  If the resulting API doesn't make sense to the end users, then we
>have failed (in my opinion).
>
>Question for NON-Computer Scientists:
>Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
>sense?
>
>
>
>
>-Cameron
>
>
>On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote:
>
>> I find the new structuring (A) to be a bit confusing. I think the part I
>> find confusing is the 'data' part. Combining Dataset, DatasetProcessor,
>>and
>> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if
>>I
>> can give some examples.
>>
>> ocw.data.content is where I assume that Dataset is defined? The naming
>> doesn't really imply this to me. "content" is a bit ambiguous.
>>
>> ocw.data.processing is fine but personally I find ocw.dataset_processor
>>to
>> be more clear. Makes it seem like you have this object
>>(DatasetProcessor)
>> that let's you "process datasets". The first one says to me "O, I can
>> process data...what does that mean?".
>>
>> In my opinion, data_source.rcmed.getDataset() is more understandable
>>than
>> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a
>> datasource from which you can get a dataset. I think the second one does
>> this as well but not as clearly as the first. This could certainly be
>>fixed
>> by changing some of the naming (Maybe data.sources.rcmed.getDataset())
>>but
>> then why bother with the extra level of nesting if it's not doing
>> much/anything?
>>
>> Lastly, why have Plot.plotting.py? That extra directory
>> doesn't accomplish anything outside of adding another level of nesting.
>>If
>> we plan on adding more 'Plot' related modules then I would say go for
>>it,
>> but Plot.plotting seems unnecessarily redundant given that plotting.py
>>is
>> the only module in Plot.
>> --
>>
>> I will say that I'm not completely sold on misc.dataset for defining the
>> "Dataset" class. Other than that I prefer the structuring we came up
>>with
>> Monday over the new one. I don't feel that the new structuring helps
>>with
>> the Dataset problem enough to warrant the changes it makes elsewhere.
>>
>>
>> -- Joyce
>>
>>
>> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) <
>> [email protected]> wrote:
>>
>> > Hi All,
>> >
>> > Monday this week Cam, Mike and me had a 30 min talk about the
>>refactoring
>> > RCMES code and coming with a code structure.
>> > On the wiki page [1] there are two code structures , structure A and
>>B.
>> > Structure (B) was the one we came up with on Mondays talk.
>> > Since then I was trying to make some improvements on that and came up
>> with
>> > structure (A).
>> > Most of the improvements were on trying to make naming more easy
>> > understudying for user and a simpler structure.
>> > For example:
>> >                                 (B)
>> (A)
>> >                         misc.Dataset              =     Data.content
>> >                         DataSource.local   =    Data.retrieve.local
>> >                         DatasetProcessor  =     Data.process
>> >
>> > Here are some "import" examples we can have with the new structure:
>> >         import Data.content
>> >         import Datat.process
>> >         import Data.retrieve.local
>> >         import Data.retrieve.rcmed
>> >
>> > The Review Board for these python codes will come up soon.
>> >
>> > Thoughts?
>> >
>> > Best,
>> > Mazi
>> >
>> > [1]:
>> >
>> 
>>https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>h+API+summary
>> >
>> >
>> >
>> >
>> > On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote:
>> >
>> > > Hi All,
>> > >
>> > > Regarding to the RCMES refactoring API codes (Toolkit), I thought is
>> > good to have a wiki page that summarize the API we are going to have
>>for
>> > RCMES in future.
>> > > This is not the actual document we will have later for RCMES code,
>>but
>> > it just the list of classes, modules, methods and functions we may
>>need
>> to
>> > develop.
>> > > It would be great if you guys help me to complete this wiki before
>>we
>> > start the refactoring toolkit's codes.
>> > >
>> > >
>> >
>> 
>>https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
>>h+API+summary
>> > >
>> > > Best,
>> > > Mazi
>> > >
>> > >
>> > > On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote:
>> > >
>> > >> +1 for cutting 0.1-incubating and starting these changes in 0.2.
>> > >>
>> > >>
>> > >> -- Joyce
>> > >>
>> > >>
>> > >> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) <
>> > >> [email protected]> wrote:
>> > >>
>> > >>> This sounds like a good path to proceed down to me.
>> > >>>
>> > >>> I would formulate the below into a set of JIRA issues,
>> > >>> then proceed by incrementally evolving the toolkit to
>> > >>> support this.
>> > >>>
>> > >>> The only catch is that many of these could potentially
>> > >>> be API back compat. Since we haven't really talked or
>> > >>> suggested about the impact of this; nee made a release,
>> > >>> it's certainly possible to do this in trunk.
>> > >>>
>> > >>> My suggestion though since trunk represents what we
>> > >>> all believe to be RCMET 2.0 API compat, we should probably
>> > >>> create a branch for this. Or, better yet:
>> > >>>
>> > >>> 1. Close out current JIRA issues for 0.1-incubating.
>> > >>> 2. Cut a 0.1-incubating RC/release process.
>> > >>> 3. Start to implement the below in 0.2-incubating.
>> > >>>
>> > >>> Thoughts?
>> > >>>
>> > >>> Cheers,
>> > >>> Chris
>> > >>>
>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> Chris Mattmann, Ph.D.
>> > >>> Senior Computer Scientist
>> > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > >>> Office: 171-266B, Mailstop: 171-246
>> > >>> Email: [email protected]
>> > >>> WWW:  http://sunset.usc.edu/~mattmann/
>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>> Adjunct Assistant Professor, Computer Science Department
>> > >>> University of Southern California, Los Angeles, CA 90089 USA
>> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> -----Original Message-----
>> > >>> From: Michael Joyce <[email protected]>
>> > >>> Reply-To: "[email protected]"
>> > >>> <[email protected]>
>> > >>> Date: Wednesday, June 5, 2013 6:56 AM
>> > >>> To: dev <[email protected]>
>> > >>> Subject: Proposed Toolkit Refactoring
>> > >>>
>> > >>>> All,
>> > >>>>
>> > >>>> This is a brief rundown of a discussion that Paul, Cam, Mazi,
>>and I
>> > had
>> > >>>> yesterday regarding the current state of the toolkit and proposed
>> > changes
>> > >>>> that we would like to discuss with the list.
>> > >>>>
>> > >>>> We discussed adding a number of objects that should help simplify
>> > toolkit
>> > >>>> usage. Below is a high-level rundown of our discussion.
>> > >>>>
>> > >>>> --
>> > >>>>
>> > >>>> Dataset: Simple container object for a dataset. Provides helpers
>>for
>> > >>>> accessing relevant data (getLatsLons, getTime) and convenience
>> > functions
>> > >>>> (writeToFile()).
>> > >>>>
>> > >>>> DataSource: Provides the user with helper functions for grabbing
>>the
>> > data
>> > >>>> that they want to evaluate. There's a RCMED module specifically
>>for
>> > >>>> grabbing RCMED data and a Local module for grabbing local data.
>>This
>> > could
>> > >>>> easily be expanded to include ESG and other data sources.
>> > >>>>
>> > >>>> DatasetProcessor: Any operation that needs to be run on datasets
>> (that
>> > >>>> isn't the evaluation obviously) is found in the
>>DatasetProcessor. It
>> > >>>> supports:
>> > >>>> - regridding (spatial and temporal)
>> > >>>> - masking/cleaning/filtering
>> > >>>> - subsetting (spatial and temporal)
>> > >>>> - ensemble generation
>> > >>>> - anything else that fits here.
>> > >>>>
>> > >>>> Evaluation: The Evaluation object is (surprise surprise) in
>>charge
>> of
>> > >>>> running Evaluations. It keeps track of the datasets (both
>> 'reference'
>> > and
>> > >>>> the 'targets') that the user wants to use in the evaluation. It
>>runs
>> > all
>> > >>>> the necessary evaluations and keeps the results nicely stored and
>> > readily
>> > >>>> accessible for the user.
>> > >>>>
>> > >>>> Metric: Metrics are added to an Evaluation and used during the
>>run.
>> > All
>> > >>>> metrics inherit from the base Metric class. All you need to add
>>new
>> > >>>> metrics
>> > >>>> is inherit from Metric and override the 'run' method.
>> > >>>>
>> > >>>> Plotter: The Plotter makes result visualization a breeze. If you
>> give
>> > it
>> > >>>> an
>> > >>>> Evaluation object it will spit
>> > >>>> out plots of all the results. Give it a Dataset and it will spit
>> out a
>> > >>>> plot. You can even have it return
>> > >>>> Matplotlib objects so you can make your results look exactly the
>>way
>> > you'd
>> > >>>> like.
>> > >>>>
>> > >>>> -- Joyce
>> > >>>
>> > >>>
>> > >
>> >
>> >
>>

Re: Proposed Toolkit Refactoring

Reply via email to