[ https://issues.apache.org/jira/browse/CLIMATE-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399845#comment-15399845 ]
ASF GitHub Bot commented on CLIMATE-825: ---------------------------------------- Github user agoodm commented on the issue: https://github.com/apache/climate/pull/374 @huikyole Not sure if I agree with your first suggestion. I thought the intent of having data_source as a separate directory was originally due to having multiple modules for loading from each data source. The modules in the base ocw directory represent each of the individual steps in the workflow, eg dataset processing, running evaluations, and plotting. The main thing that was missing previously was loading the datasets which is exactly what this module aims to do. So for now I think leaving it here is appropriate. I think @lewismc should share his thoughts on this though too. I absolutely agree with your second suggestion though. I originally had it set up this way because to my recollection, the rest of the OCW codebase (particularly metrics and evaluations) were designed with "one reference" dataset. Given that Loikith et al. 2013 uses two reanalysis datasets, we should get rid of this rigid assumption not only for `dataset_loader.py` but potentially for `evaluation.py` as well. I think for now changing the former obviously takes precedence but we should consider exploring the the latter as well. > Coalesce data sources into one module > ------------------------------------- > > Key: CLIMATE-825 > URL: https://issues.apache.org/jira/browse/CLIMATE-825 > Project: Apache Open Climate Workbench > Issue Type: Improvement > Components: data sources > Affects Versions: 1.0.0 > Reporter: Alex Goodman > Assignee: Alex Goodman > Fix For: 1.2.0 > > > Kyo and I will be working on overhauling the way data loading is handled in > the current RCMES workflow. Right now, the user manually specifies the > sources for each dataset which are currently separated into three categories: > local files on disk, the RCMES database (RCMED), and the Earth System Grid > (ESGF). These cases are currently handled in separate modules / function > calls, but it would be most ideal in the future to create one universal > function call for all the data loading. An example schematic would be > something like: > datasets = load(sources, ...) > Here datasets would be a list of OCW Dataset objects, sources would be a list > of source specifications for each requested dataset (eg, 'esgf', 'local', or > 'rcmed'). Ideally we would also like better support for handling datasets > spanned by multiple files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)