[
https://issues.apache.org/jira/browse/CLIMATE-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on CLIMATE-575 started by Michael Joyce.
---------------------------------------------
> Implement initial config based execution of an evaluation
> ---------------------------------------------------------
>
> Key: CLIMATE-575
> URL: https://issues.apache.org/jira/browse/CLIMATE-575
> Project: Apache Open Climate Workbench
> Issue Type: Task
> Components: general
> Affects Versions: 0.5
> Reporter: Michael Joyce
> Assignee: Michael Joyce
> Fix For: 1.0.0
>
>
> Brainstorming ideas for an initial config format for running an evaluation. I
> have an idea of one below. Note that this doesn't necessarily encapsulate all
> the functionality in the system yet. Empty sections are still a work in
> progress and will be filled in when possible.
> ---
> At the moment, the assumption is that there will a single config file for one
> evaluation.
> h1. Sections
> There will be sections for
> * Datasets
> * Metrics
> * Plotting
> h2. Datasets
> Specified under a \[datasets\] tag. This will be where all the datasets that
> will be loaded will be specified. A dataset will be specified with the
> following format:
> eval_purpose_identifier: data_source_keyword data_source_locator_data
> optional_keyword_args
> h3. eval_purpose_identifier
> Either "reference" or "target". If there are multiple target datasets in the
> evaluation then they should all share the eval_purpose_identifier of "target"
> h3. data_source_keyword
> Specifies which data source will be used to load this dataset. At the current
> state of the library the valid options would be "local", "dap", "rcmed", and
> "esgf".
> h3. data_source_locator_data
> Data necessary for loading the dataset. This varies based on the data source
> that will be used for loading this data. If you look at the docs for the data
> sources, these are effectively the required elements for loading a dataset.
> h4. local data_source_locator_data
> There will be two parts for a local datasource. Each of these should be
> separated by a space.
> * The path to the file to load (if it's a single file dataset) or the path to
> the directory where multiple files are located, the accepted separator text
> (tentatively "###"), and the glob pattern for the files to load.
> * The variable name
> h4. dap data_source_locator_data
> Each of these should be separated by a space.
> * OpenDAP URL
> * Variable name
> h4. rcmed data_source_locator_data
> Each of these should be separated by a space.
> * dataset_id
> * parameter_id
> * min_lat
> * max_lat
> * min_lon
> * max_lon
> * start_time
> * end_time
> h4. esgf data_source_locator_data
> Each of these should be separated by a space.
> * dataset_id
> * variable name
> * esgf username
> * esgf password
> h3. optional_keyword_args
> Any additional keyword args should be specified as a tuple after all of the
> required values have been specified. Again, these should be separated by a
> space from each other. Check the API docs for valid keyword args.
> h2. Metrics
> h2. Plotting
> ---
> Thoughts?
> A few of my concerns are:
> * Can we use whitespace to separate multiple items that we're passing and how
> will we handle single elements which contain valid whitespace? For instance
> file paths. If we place elements in quotes will that help with grouping?
> Should we use a specific separator value to split everything?
> * How should we pass the time formats for RCMED datasets?
> * Can we pass keyword args as a tuple? Will this work
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)