[jira] [Commented] (CLIMATE-825) Coalesce data sources into one module

ASF GitHub Bot (JIRA) Fri, 29 Jul 2016 11:51:38 -0700

    [ 
https://issues.apache.org/jira/browse/CLIMATE-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399845#comment-15399845
 ]


ASF GitHub Bot commented on CLIMATE-825:
----------------------------------------

Github user agoodm commented on the issue:

    https://github.com/apache/climate/pull/374
  
    @huikyole Not sure if I agree with your first suggestion. I thought the 
intent of having data_source as a separate directory was originally due to 
having multiple modules for loading from each data source. The modules in the 
base ocw directory represent each of the individual steps in the workflow, eg 
dataset processing, running evaluations, and plotting. The main thing that was 
missing previously was loading the datasets which is exactly what this module 
aims to do. So for now I think leaving it here is appropriate. I think @lewismc 
should share his thoughts on this though too.
    
    I absolutely agree with your second suggestion though. I originally had it 
set up this way because to my recollection, the rest of the OCW codebase 
(particularly metrics and evaluations) were designed with "one reference" 
dataset. Given that Loikith et al. 2013 uses two reanalysis datasets, we should 
get rid of this rigid assumption not only for `dataset_loader.py` but 
potentially for `evaluation.py` as well. I think for now changing  the former 
obviously takes precedence but we should consider exploring the the latter as 
well.


> Coalesce data sources into one module
> -------------------------------------
>
>                 Key: CLIMATE-825
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-825
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: data sources
>    Affects Versions: 1.0.0
>            Reporter: Alex Goodman
>            Assignee: Alex Goodman
>             Fix For: 1.2.0
>
>
> Kyo and I will be working on overhauling the way data loading is handled in 
> the current RCMES workflow. Right now, the user manually specifies the 
> sources for each dataset which are currently separated into three categories: 
> local files on disk, the RCMES database (RCMED), and the Earth System Grid 
> (ESGF). These cases are currently handled in separate modules / function 
> calls, but it would be most ideal in the future to create one universal 
> function call for all the data loading. An example schematic would be 
> something like:
> datasets = load(sources, ...)
> Here datasets would be a list of OCW Dataset objects, sources would be a list 
> of source specifications for each requested dataset (eg, 'esgf', 'local', or 
> 'rcmed'). Ideally we would also like better support for handling datasets 
> spanned by multiple files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CLIMATE-825) Coalesce data sources into one module

Reply via email to