Re: Proposed Toolkit Refactoring

Michael Joyce Thu, 13 Jun 2013 10:43:17 -0700

Paul,

io vs data_source - I would stick with data_source. Much clearer in my
opinion.
dataset_processor - still needs to be in there


Regarding Evaluation and Metrics. They're interrelated and should be in the
same package in my opinion. I also dislike storing a ton of metric
functions in metrics. I much prefer the class formatting that we discussed
at our last meeting (perhaps I'm just misreading this part though).

plotter vs display vs visualization: I like visualization the best
personally.

As a side note we need to be careful how we name files and various classes.
Otherwise our imports are going to be annoying and redundant. Say we want
to get the Metric class from the metric file. We end up with
ocw.metric.metric which is just ugly.

-- Joyce


On Thu, Jun 13, 2013 at 7:40 AM, Ramirez, Paul M (398J) <
[email protected]> wrote:

> All,
>
> What about instead of Plotter we collocate the plots into a Display class
> or module?
>
> Agree with Mike on the extra fluff around packages not adding and
> therefore should be dropped.
>
>
> rcmed = new RCMED(database_info)
> obs = rcmed.loadObservation(key)
> model = local.loadModel(filepath)
> metric = [metrics.bias, metrics.pdf]
> evaluation = new Evaluation(obs, model, metric)
> results = evaluation.run()
>
> Maybe I'm simplifying this too much but wouldn't the following suffice?
>
> ocw
> ├── __init__.py
> ├── dataset.py
> ├── display.py
> ├── evaluate.py
> ├── io
> │   ├── __init.py__
> │   ├── esg.py
> │   ├── local.py
> │   └── rcmed.py
> └── metrics.py
>
>
>
> --Paul
>
>
>
> On 6/12/13 10:16 PM, "Kim, Jinwon" <[email protected]> wrote:
>
> >
> >"Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
> >sense?"
> >--> these can be taken as 'standard terminology' except 'plotter'.
> >
> >--------------------------------------------------------------------------
> >---------------------------
> >Jinwon Kim
> >Dept. Atmospheric and Oceanic Sciences and
> >Joint Institute for Regional Earth System Science and Engineering
> >University of California, Los Angeles
> >Los Angeles, CA 90095-1565
> >________________________________________
> >From: [email protected] [[email protected]] on behalf of Cameron
> >Goodale [[email protected]]
> >Sent: Wednesday, June 12, 2013 10:10 PM
> >To: [email protected]
> >Subject: Re: Proposed Toolkit Refactoring
> >
> >I have to agree with Mike on this one, but reserve the right to change my
> >mind later ;)
> >
> >It is a delicate balance between organizing code that is maintainable,
> >decoupled, and still retains an API that is easy for humans to read and
> >understand.  I will always favor direct and concise names over fuzzy or
> >ambiguous ones.  Naming is hard, period.
> >
> >I don't like the misc.Dataset.py that feels clunky and your don't get
> much
> >more fuzzy than 'misc', so we should get that cleaned up.
> >
> >Thank you Mazi for creating the wiki page so we can all visually see the
> >code structure, this is a big help to the project.
> >
> >I would like to invite any of the science users and/or devs to weigh in on
> >this.  If the resulting API doesn't make sense to the end users, then we
> >have failed (in my opinion).
> >
> >Question for NON-Computer Scientists:
> >Do terms like 'Dataset', 'Evaluation', 'Metric', 'Plotter' make good
> >sense?
> >
> >
> >
> >
> >-Cameron
> >
> >
> >On Wed, Jun 12, 2013 at 4:43 PM, Michael Joyce <[email protected]> wrote:
> >
> >> I find the new structuring (A) to be a bit confusing. I think the part I
> >> find confusing is the 'data' part. Combining Dataset, DatasetProcessor,
> >>and
> >> DataSource into 'Data' seems to cause a bit of ambiguity. Let me see if
> >>I
> >> can give some examples.
> >>
> >> ocw.data.content is where I assume that Dataset is defined? The naming
> >> doesn't really imply this to me. "content" is a bit ambiguous.
> >>
> >> ocw.data.processing is fine but personally I find ocw.dataset_processor
> >>to
> >> be more clear. Makes it seem like you have this object
> >>(DatasetProcessor)
> >> that let's you "process datasets". The first one says to me "O, I can
> >> process data...what does that mean?".
> >>
> >> In my opinion, data_source.rcmed.getDataset() is more understandable
> >>than
> >> data.retrieve.rcmed.getDataset(). It makes it clear that 'rcmed' is a
> >> datasource from which you can get a dataset. I think the second one does
> >> this as well but not as clearly as the first. This could certainly be
> >>fixed
> >> by changing some of the naming (Maybe data.sources.rcmed.getDataset())
> >>but
> >> then why bother with the extra level of nesting if it's not doing
> >> much/anything?
> >>
> >> Lastly, why have Plot.plotting.py? That extra directory
> >> doesn't accomplish anything outside of adding another level of nesting.
> >>If
> >> we plan on adding more 'Plot' related modules then I would say go for
> >>it,
> >> but Plot.plotting seems unnecessarily redundant given that plotting.py
> >>is
> >> the only module in Plot.
> >> --
> >>
> >> I will say that I'm not completely sold on misc.dataset for defining the
> >> "Dataset" class. Other than that I prefer the structuring we came up
> >>with
> >> Monday over the new one. I don't feel that the new structuring helps
> >>with
> >> the Dataset problem enough to warrant the changes it makes elsewhere.
> >>
> >>
> >> -- Joyce
> >>
> >>
> >> On Wed, Jun 12, 2013 at 1:21 PM, Boustani, Maziyar (398F) <
> >> [email protected]> wrote:
> >>
> >> > Hi All,
> >> >
> >> > Monday this week Cam, Mike and me had a 30 min talk about the
> >>refactoring
> >> > RCMES code and coming with a code structure.
> >> > On the wiki page [1] there are two code structures , structure A and
> >>B.
> >> > Structure (B) was the one we came up with on Mondays talk.
> >> > Since then I was trying to make some improvements on that and came up
> >> with
> >> > structure (A).
> >> > Most of the improvements were on trying to make naming more easy
> >> > understudying for user and a simpler structure.
> >> > For example:
> >> >                                 (B)
> >> (A)
> >> >                         misc.Dataset              =     Data.content
> >> >                         DataSource.local   =    Data.retrieve.local
> >> >                         DatasetProcessor  =     Data.process
> >> >
> >> > Here are some "import" examples we can have with the new structure:
> >> >         import Data.content
> >> >         import Datat.process
> >> >         import Data.retrieve.local
> >> >         import Data.retrieve.rcmed
> >> >
> >> > The Review Board for these python codes will come up soon.
> >> >
> >> > Thoughts?
> >> >
> >> > Best,
> >> > Mazi
> >> >
> >> > [1]:
> >> >
> >>
> >>
> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
> >>h+API+summary
> >> >
> >> >
> >> >
> >> >
> >> > On Jun 6, 2013, at 9:31 AM, Boustani, Maziyar (398F) wrote:
> >> >
> >> > > Hi All,
> >> > >
> >> > > Regarding to the RCMES refactoring API codes (Toolkit), I thought is
> >> > good to have a wiki page that summarize the API we are going to have
> >>for
> >> > RCMES in future.
> >> > > This is not the actual document we will have later for RCMES code,
> >>but
> >> > it just the list of classes, modules, methods and functions we may
> >>need
> >> to
> >> > develop.
> >> > > It would be great if you guys help me to complete this wiki before
> >>we
> >> > start the refactoring toolkit's codes.
> >> > >
> >> > >
> >> >
> >>
> >>
> https://cwiki.apache.org/confluence/display/CLIMATE/Open+Climate+Workbenc
> >>h+API+summary
> >> > >
> >> > > Best,
> >> > > Mazi
> >> > >
> >> > >
> >> > > On Jun 5, 2013, at 7:40 AM, Michael Joyce wrote:
> >> > >
> >> > >> +1 for cutting 0.1-incubating and starting these changes in 0.2.
> >> > >>
> >> > >>
> >> > >> -- Joyce
> >> > >>
> >> > >>
> >> > >> On Wed, Jun 5, 2013 at 7:19 AM, Mattmann, Chris A (398J) <
> >> > >> [email protected]> wrote:
> >> > >>
> >> > >>> This sounds like a good path to proceed down to me.
> >> > >>>
> >> > >>> I would formulate the below into a set of JIRA issues,
> >> > >>> then proceed by incrementally evolving the toolkit to
> >> > >>> support this.
> >> > >>>
> >> > >>> The only catch is that many of these could potentially
> >> > >>> be API back compat. Since we haven't really talked or
> >> > >>> suggested about the impact of this; nee made a release,
> >> > >>> it's certainly possible to do this in trunk.
> >> > >>>
> >> > >>> My suggestion though since trunk represents what we
> >> > >>> all believe to be RCMET 2.0 API compat, we should probably
> >> > >>> create a branch for this. Or, better yet:
> >> > >>>
> >> > >>> 1. Close out current JIRA issues for 0.1-incubating.
> >> > >>> 2. Cut a 0.1-incubating RC/release process.
> >> > >>> 3. Start to implement the below in 0.2-incubating.
> >> > >>>
> >> > >>> Thoughts?
> >> > >>>
> >> > >>> Cheers,
> >> > >>> Chris
> >> > >>>
> >> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > >>> Chris Mattmann, Ph.D.
> >> > >>> Senior Computer Scientist
> >> > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > >>> Office: 171-266B, Mailstop: 171-246
> >> > >>> Email: [email protected]
> >> > >>> WWW:  http://sunset.usc.edu/~mattmann/
> >> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > >>> Adjunct Assistant Professor, Computer Science Department
> >> > >>> University of Southern California, Los Angeles, CA 90089 USA
> >> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> -----Original Message-----
> >> > >>> From: Michael Joyce <[email protected]>
> >> > >>> Reply-To: "[email protected]"
> >> > >>> <[email protected]>
> >> > >>> Date: Wednesday, June 5, 2013 6:56 AM
> >> > >>> To: dev <[email protected]>
> >> > >>> Subject: Proposed Toolkit Refactoring
> >> > >>>
> >> > >>>> All,
> >> > >>>>
> >> > >>>> This is a brief rundown of a discussion that Paul, Cam, Mazi,
> >>and I
> >> > had
> >> > >>>> yesterday regarding the current state of the toolkit and proposed
> >> > changes
> >> > >>>> that we would like to discuss with the list.
> >> > >>>>
> >> > >>>> We discussed adding a number of objects that should help simplify
> >> > toolkit
> >> > >>>> usage. Below is a high-level rundown of our discussion.
> >> > >>>>
> >> > >>>> --
> >> > >>>>
> >> > >>>> Dataset: Simple container object for a dataset. Provides helpers
> >>for
> >> > >>>> accessing relevant data (getLatsLons, getTime) and convenience
> >> > functions
> >> > >>>> (writeToFile()).
> >> > >>>>
> >> > >>>> DataSource: Provides the user with helper functions for grabbing
> >>the
> >> > data
> >> > >>>> that they want to evaluate. There's a RCMED module specifically
> >>for
> >> > >>>> grabbing RCMED data and a Local module for grabbing local data.
> >>This
> >> > could
> >> > >>>> easily be expanded to include ESG and other data sources.
> >> > >>>>
> >> > >>>> DatasetProcessor: Any operation that needs to be run on datasets
> >> (that
> >> > >>>> isn't the evaluation obviously) is found in the
> >>DatasetProcessor. It
> >> > >>>> supports:
> >> > >>>> - regridding (spatial and temporal)
> >> > >>>> - masking/cleaning/filtering
> >> > >>>> - subsetting (spatial and temporal)
> >> > >>>> - ensemble generation
> >> > >>>> - anything else that fits here.
> >> > >>>>
> >> > >>>> Evaluation: The Evaluation object is (surprise surprise) in
> >>charge
> >> of
> >> > >>>> running Evaluations. It keeps track of the datasets (both
> >> 'reference'
> >> > and
> >> > >>>> the 'targets') that the user wants to use in the evaluation. It
> >>runs
> >> > all
> >> > >>>> the necessary evaluations and keeps the results nicely stored and
> >> > readily
> >> > >>>> accessible for the user.
> >> > >>>>
> >> > >>>> Metric: Metrics are added to an Evaluation and used during the
> >>run.
> >> > All
> >> > >>>> metrics inherit from the base Metric class. All you need to add
> >>new
> >> > >>>> metrics
> >> > >>>> is inherit from Metric and override the 'run' method.
> >> > >>>>
> >> > >>>> Plotter: The Plotter makes result visualization a breeze. If you
> >> give
> >> > it
> >> > >>>> an
> >> > >>>> Evaluation object it will spit
> >> > >>>> out plots of all the results. Give it a Dataset and it will spit
> >> out a
> >> > >>>> plot. You can even have it return
> >> > >>>> Matplotlib objects so you can make your results look exactly the
> >>way
> >> > you'd
> >> > >>>> like.
> >> > >>>>
> >> > >>>> -- Joyce
> >> > >>>
> >> > >>>
> >> > >
> >> >
> >> >
> >>
>
>

Re: Proposed Toolkit Refactoring

Reply via email to