[ 
https://issues.apache.org/jira/browse/CLIMATE-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693731#comment-14693731
 ] 

ASF GitHub Bot commented on CLIMATE-564:
----------------------------------------

Github user MJJoyce commented on a diff in the pull request:

    https://github.com/apache/climate/pull/223#discussion_r36878401
  
    --- Diff: ocw/data_source/local.py ---
    @@ -268,3 +268,38 @@ def load_file(file_path,
     
         return Dataset(lats, lons, times, values, variable=variable_name,
                        units=variable_unit, name=name, origin=origin)
    +
    +def load_multiple_files(data_info):
    --- End diff --
    
    I'm not the biggest fan of passing info this way. The config related stuff 
should be handled in the proper package and necessary parameters should be 
passed into this function independent of the nested config format. Otherwise 
this becomes dependent upon the format of a config file that is subject to 
change and it's not terribly helpful to call when not using output from the 
config file.
    
    In other words
    * This requires output from a parsed config file to function, and that's 
confusing.
    * The format of the config file isn't specified here, so the user has no 
idea how to actually use this function unless they read the config related 
stuff. If they have to use the config stuff then this functionality should be 
in the config related code it would seem.
    * If/when the format of the config file changes this is subject to break, 
which is just poorly encapsulating functionality it seems.
    
    I think this would be much cleaner if this function had a defined interface 
that was independent of the config file instead of taking nested config output.



> Managing multiple netcdf files stored on a local machine
> --------------------------------------------------------
>
>                 Key: CLIMATE-564
>                 URL: https://issues.apache.org/jira/browse/CLIMATE-564
>             Project: Apache Open Climate Workbench
>          Issue Type: Improvement
>          Components: data sources
>    Affects Versions: 0.5
>            Reporter: Huikyo Lee
>            Assignee: Huikyo Lee
>             Fix For: 1.0.0
>
>
> Currently, ocw.local.load_file module reads a single netcdf file then 
> generates a OCW dataset object. In most of climate science use cases, users 
> need to read and process data from multiple files which include same patterns 
> (ex) 3 hourly TRMM data, CMIP5 model output and so on). I will add another 
> module 'load_multiple_files (tentative name)' to take care of daily or 
> sub-daily data and returns an OCW dataset object. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to