Michael Joyce created CLIMATE-575:
-------------------------------------

             Summary: Implement initial config based execution of an evaluation
                 Key: CLIMATE-575
                 URL: https://issues.apache.org/jira/browse/CLIMATE-575
             Project: Apache Open Climate Workbench
          Issue Type: Task
          Components: general
    Affects Versions: 0.5
            Reporter: Michael Joyce
            Assignee: Michael Joyce
             Fix For: 1.0.0


Brainstorming ideas for an initial config format for running an evaluation. I 
have an idea of one below. Note that this doesn't necessarily encapsulate all 
the functionality in the system yet. Empty sections are still a work in 
progress and will be filled in when possible.

---
At the moment, the assumption is that there will a single config file for one 
evaluation.

 h1. Sections
There will be sections for
* Datasets
* Metrics
* Plotting

h2. Datasets
Specified under a \[datasets\] tag. This will be where all the datasets that 
will be loaded will be specified. A dataset will be specified with the 
following format:

eval_purpose_identifier: data_source_keyword data_source_locator_data 
optional_keyword_args

h3. eval_purpose_identifier
Either "reference" or "target". If there are multiple target datasets in the 
evaluation then they should all share the eval_purpose_identifier of "target"

h3. data_source_keyword
Specifies which data source will be used to load this dataset. At the current 
state of the library the valid options would be "local", "dap", "rcmed", and 
"esgf".

h3. data_source_locator_data
Data necessary for loading the dataset. This varies based on the data source 
that will be used for loading this data. If you look at the docs for the data 
sources, these are effectively the required elements for loading a dataset.

h4. local data_source_locator_data
There will be two parts for a local datasource. Each of these should be 
separated by a space.
* The path to the file to load (if it's a single file dataset) or the path to 
the directory where multiple files are located, the accepted separator text 
(tentatively "###"), and the glob pattern for the files to load.
* The variable name

h4. dap data_source_locator_data
Each of these should be separated by a space.
* OpenDAP URL
* Variable name

h4. rcmed data_source_locator_data
Each of these should be separated by a space.
* dataset_id
* parameter_id
* min_lat
* max_lat
* min_lon
* max_lon
* start_time
* end_time

h4. esgf data_source_locator_data
Each of these should be separated by a space.
* dataset_id
* variable name
* esgf username
* esgf password

h3. optional_keyword_args
Any additional keyword args should be specified as a tuple after all of the 
required values have been specified. Again, these should be separated by a 
space from each other. Check the API docs for valid keyword args.

h2. Metrics

h2. Plotting

---

Thoughts?

A few of my concerns are:
* Can we use whitespace to separate multiple items that we're passing and how 
will we handle single elements which contain valid whitespace? For instance 
file paths. If we place elements in quotes will that help with grouping? Should 
we use a specific separator value to split everything?
* How should we pass the time formats for RCMED datasets?
* Can we pass keyword args as a tuple? Will this work



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to