On Sep 18, 2013, at 12:32 PM, Steve Hankin wrote:
> 
> 
> On 9/18/2013 7:56 AM, Roy Mendelssohn - NOAA Federal wrote:
>> Hi All:
>> 
>> NASA has used hierarchies for years, and appears committed to them.  So, 
>> either it is done in an ad hoc way, or through a standard.  That doesn't 
>> mean CF is the place for the standard, just that it would be nice to have 
>> one.
>> 
> 
> Roy,
> 
> Lets explore the avenue you have opened here:  "that doesn't mean CF is the 
> place for the standard".  The need for hierarchies as tools for programming 
> is indisputable.  But will hierarchical groups advance the interoperability 
> objectives of CF?  

Steve,
Speaking for myself, I use groups in data files to organize the various 
datasets so that a person looking at the file via the commandline (h5dump, 
ncdump) or application (HDFView, Panoply) can find the dataset they're 
interested in easily. For instance, in our swath-level (L2) data, we have a 
number of datasets that aren't really that relevant to our end users, but could 
come in handy when diagnosing a problem with the algorithm or to monitor 
algorithm performance. So these diagnostic datasets don't clutter up the 
output, we've put them into a separate group from the main datasets.

So, in this case, do the groups make the files more interoperable? Not really, 
if we're talking about a completely software-driven system. But this *does* 
make them more user-friendly, and we'd definitely like to maximize our 
compatibility as well with those software-driven processes. Why not have the 
best of both worlds?  Hence, I'm fully supporting CF incorporate groups into 
the conventions. I think Charlie's proposal is an excellent starting point.

Cheers,
-Corey

> At the start of this discussion I had assumed that there would be compelling 
> examples that supported the introduction of hierarchies to CF.  Thus far all 
> that have been put on display seem to be counter-examples(*):
>       • For CMIP5 any given hierarchy is an arbitrary, brittle 
> representation.  The CMIP5 collection is better modeled by facets (metadata 
> tags) than by hierarchies.
>       • The suitcase analogy serves best to illustrate the problems that 
> hierarchies can bring -- to locate the black socks in a suitcase usually 
> involves rummaging the entire suitcase. 
>               • ==>  Which speaks to Rich's valid concern that the 
> data-discovery-to-data-access transition may be very negatively impacted if 
> hierarchies are not used carefully.
>       • NASA hierarchies that are 10 levels deep strike me as by definition 
> an "insider" view of a data collection.  These hierarchies may add clarity 
> for the specific satellite program communicating with its designated science 
> groups, but they are likely a barrier to an outsider wanting to utilize the 
> data. 
> To proceed forward we need to see some compelling use cases that will help us 
> to understand the costs and benefits?    
> 
>     - Steve
> 
> (*) with the exception of Feature Collections types already contained in CF
> 
> =================================================
> 
>> I would point out that every major modern  programming language has 
>> structures, which are essentially hierarchies.  Matlab was criticized for 
>> years about not having structures, and finally added them a few years back.  
>> R has them, C has them, Python has them, even modern Fortran has them.  So 
>> clearly there must be situations where hierarchies make sense, and are more 
>> efficient than having everything flat.  There are clearly situations where 
>> flattening everything makes sense.
>> 
>> My $0.02.
>> 
>> -Roy
>> 
>> 
>> 
>> On Sep 18, 2013, at 4:52 AM, "Signell, Richard" 
>> <[email protected]>
>>  wrote:
>> 
>> 
>>> All,
>>> 
>>> I'm glad we are discussing this topic, but the fact that large data
>>> providers are already distributing data using groups and hierarchies
>>> is not a compelling reason to endorse this practice through CF.  After
>>> all, a lot of data providers are currently distributing scientific
>>> data in any number of forms, and the point of CF (along with OGC
>>> standards) is to help clean up the mess!
>>> 
>>> I agree that groups make sense for metadata and for certain types of
>>> datasets.  For example, the discrete sampling geometry featureTypes
>>> like profile collection would be easier to understand and deal with as
>>> a netcdf4 group of profiles rather than as a netcdf3 ragged array.
>>> But the choice was made for CF 1.6 that backward compatibility was
>>> more important.
>>> 
>>> I don't think it's cowardly to belive that the more folks use groups
>>> to organize their data in an ad hoc way (the suitcase analogy), the
>>> more it will hinder the remarkable progress that has been made
>>> recently on finding and utilizing distributed CF data via the catalog
>>> services (e.g. the geonetwork, gi-cat, geoportal, CKAN instances) that
>>> many governments are setting up.   When we open the data service
>>> endpoints that our query returns, we need to have known data
>>> structures, and that's what the CF featureTypes provide.
>>> 
>>> To return to the suitcase/clothing analogy again, we are rapidly
>>> gaining the capability via good metadata and catalog services to find
>>> all the black socks owned by Jim and Martin that have been washed in
>>> the last week.  But if our catalog query returns fourteen of Jim's
>>> suitcases and twelve of Martin's, then we have more work to do.
>>> Unlike socks, luckily we don't need actual suitcases to organize data,
>>> we can construct collections on the fly using whatever attributes we
>>> desire.
>>> 
>>> I would hope that our job as the CF community would be to identify
>>> compelling additional specific featureTypes that we should support.
>>> And if these identified featureTypes demand groups for efficiency or
>>> some other reason, well, let's have that discussion.
>>> 
>>> -Rich
>>> 
>>> On Wed, Sep 18, 2013 at 12:08 AM, Roy Mendelssohn - NOAA Federal
>>> 
>>> <[email protected]>
>>>  wrote:
>>> 
>>>> Hi All:
>>>> 
>>>> I am old and slow, and I must be missing something, because at this point 
>>>> most of the discussion has been about the desirability of files with 
>>>> groups and hierarchies.  Again, unless I am missing something, there 
>>>> already are data providers who are distributing data using groups and 
>>>> hierarchies, including at least one very large data provider,  and they 
>>>> obviously feel that there is a benefit to such structures.  I am not 
>>>> arguing whether they are right or wrong, just that is the reality.
>>>> 
>>>> If we start from that premise, then the real questions for discussion are 
>>>> should there be conventions on how groups and hierarchies are used in 
>>>> netcdf4 and hdf5 files, so that a user or software provider will know what 
>>>> to expect, and the second question is if it is deemed desirable to have 
>>>> such conventions, is CF the  proper place for them to be developed.
>>>> 
>>>> My sense it that this is what the original proposers are after.
>>>> 
>>>> -Roy
>>>> 
>>>> 
>>>> **********************
>>>> "The contents of this message do not reflect any position of the U.S. 
>>>> Government or NOAA."
>>>> **********************
>>>> Roy Mendelssohn
>>>> Supervisory Operations Research Analyst
>>>> NOAA/NMFS
>>>> Environmental Research Division
>>>> Southwest Fisheries Science Center
>>>> 1352 Lighthouse Avenue
>>>> Pacific Grove, CA 93950-2097
>>>> 
>>>> e-mail: 
>>>> [email protected]
>>>>  (Note new e-mail address)
>>>> voice: (831)-648-9029
>>>> fax: (831)-648-8440
>>>> www: 
>>>> http://www.pfeg.noaa.gov/
>>>> 
>>>> 
>>>> "Old age and treachery will overcome youth and skill."
>>>> "From those who have been given much, much will be expected"
>>>> "the arc of the moral universe is long, but it bends toward justice" -MLK 
>>>> Jr.
>>>> 
>>>> _______________________________________________
>>>> CF-metadata mailing list
>>>> 
>>>> [email protected]
>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>> 
>>> 
>>> -- 
>>> Dr. Richard P. Signell   (508) 457-2229
>>> USGS, 384 Woods Hole Rd.
>>> Woods Hole, MA 02543-1598
>>> 
>> **********************
>> "The contents of this message do not reflect any position of the U.S. 
>> Government or NOAA."
>> **********************
>> Roy Mendelssohn
>> Supervisory Operations Research Analyst
>> NOAA/NMFS
>> Environmental Research Division
>> Southwest Fisheries Science Center
>> 1352 Lighthouse Avenue
>> Pacific Grove, CA 93950-2097
>> 
>> e-mail: 
>> [email protected]
>>  (Note new e-mail address)
>> voice: (831)-648-9029
>> fax: (831)-648-8440
>> www: 
>> http://www.pfeg.noaa.gov/
>> 
>> 
>> "Old age and treachery will overcome youth and skill."
>> "From those who have been given much, much will be expected" 
>> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr.
>> 
>> _______________________________________________
>> CF-metadata mailing list
>> 
>> [email protected]
>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> _______________________________________________
> CF-metadata mailing list
> [email protected]
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

-- 
Corey Bettenhausen
Science Systems and Applications, Inc
NASA Goddard Space Flight Center
301 614 5383
[email protected]

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to