CF may well not be the proper place. I am not arguing that. All I am arguing is that the history of computer science shows that hierarchies are often beneficial compared to flat structures, whether it be the b-trees we use in our directories to structures in programming languages, while a lot of the discussion essentially has been "everything should always be flat, there is no need for hierarchies, search will take care of everything".
I also know we deal with a lot of NASA data (files) and they are all over the map in their structure and how the hierarchy is used. So from an end users point of view, having some standard on how it is done and what we can expect would be a great help. There is a use case - whether that is compelling to CF is a different matter. And I think this is where Zender et al are coming from, because they see the same thing in the NASA data files. Another thing to consider is that netcdf4 now does have groups and hierarchical structures, and down the line, in particular I believe for in situ data, they make a lot of sense (I can give practical examples of this if people want). So there is some good rational to develop best practices for their use, before the use becomes common. My $0.02. -Roy On Sep 18, 2013, at 9:32 AM, Steve Hankin <[email protected]> wrote: > > > On 9/18/2013 7:56 AM, Roy Mendelssohn - NOAA Federal wrote: >> Hi All: >> >> NASA has used hierarchies for years, and appears committed to them. So, >> either it is done in an ad hoc way, or through a standard. That doesn't >> mean CF is the place for the standard, just that it would be nice to have >> one. >> > > Roy, > > Lets explore the avenue you have opened here: "that doesn't mean CF is the > place for the standard". The need for hierarchies as tools for programming > is indisputable. But will hierarchical groups advance the interoperability > objectives of CF? At the start of this discussion I had assumed that there > would be compelling examples that supported the introduction of hierarchies > to CF. Thus far all that have been put on display seem to be > counter-examples(*): > • For CMIP5 any given hierarchy is an arbitrary, brittle > representation. The CMIP5 collection is better modeled by facets (metadata > tags) than by hierarchies. > • The suitcase analogy serves best to illustrate the problems that > hierarchies can bring -- to locate the black socks in a suitcase usually > involves rummaging the entire suitcase. > • ==> Which speaks to Rich's valid concern that the > data-discovery-to-data-access transition may be very negatively impacted if > hierarchies are not used carefully. > • NASA hierarchies that are 10 levels deep strike me as by definition > an "insider" view of a data collection. These hierarchies may add clarity > for the specific satellite program communicating with its designated science > groups, but they are likely a barrier to an outsider wanting to utilize the > data. > To proceed forward we need to see some compelling use cases that will help us > to understand the costs and benefits? > > - Steve > > (*) with the exception of Feature Collections types already contained in CF > > ================================================= > >> I would point out that every major modern programming language has >> structures, which are essentially hierarchies. Matlab was criticized for >> years about not having structures, and finally added them a few years back. >> R has them, C has them, Python has them, even modern Fortran has them. So >> clearly there must be situations where hierarchies make sense, and are more >> efficient than having everything flat. There are clearly situations where >> flattening everything makes sense. >> >> My $0.02. >> >> -Roy >> >> >> >> On Sep 18, 2013, at 4:52 AM, "Signell, Richard" >> <[email protected]> >> wrote: >> >> >>> All, >>> >>> I'm glad we are discussing this topic, but the fact that large data >>> providers are already distributing data using groups and hierarchies >>> is not a compelling reason to endorse this practice through CF. After >>> all, a lot of data providers are currently distributing scientific >>> data in any number of forms, and the point of CF (along with OGC >>> standards) is to help clean up the mess! >>> >>> I agree that groups make sense for metadata and for certain types of >>> datasets. For example, the discrete sampling geometry featureTypes >>> like profile collection would be easier to understand and deal with as >>> a netcdf4 group of profiles rather than as a netcdf3 ragged array. >>> But the choice was made for CF 1.6 that backward compatibility was >>> more important. >>> >>> I don't think it's cowardly to belive that the more folks use groups >>> to organize their data in an ad hoc way (the suitcase analogy), the >>> more it will hinder the remarkable progress that has been made >>> recently on finding and utilizing distributed CF data via the catalog >>> services (e.g. the geonetwork, gi-cat, geoportal, CKAN instances) that >>> many governments are setting up. When we open the data service >>> endpoints that our query returns, we need to have known data >>> structures, and that's what the CF featureTypes provide. >>> >>> To return to the suitcase/clothing analogy again, we are rapidly >>> gaining the capability via good metadata and catalog services to find >>> all the black socks owned by Jim and Martin that have been washed in >>> the last week. But if our catalog query returns fourteen of Jim's >>> suitcases and twelve of Martin's, then we have more work to do. >>> Unlike socks, luckily we don't need actual suitcases to organize data, >>> we can construct collections on the fly using whatever attributes we >>> desire. >>> >>> I would hope that our job as the CF community would be to identify >>> compelling additional specific featureTypes that we should support. >>> And if these identified featureTypes demand groups for efficiency or >>> some other reason, well, let's have that discussion. >>> >>> -Rich >>> >>> On Wed, Sep 18, 2013 at 12:08 AM, Roy Mendelssohn - NOAA Federal >>> >>> <[email protected]> >>> wrote: >>> >>>> Hi All: >>>> >>>> I am old and slow, and I must be missing something, because at this point >>>> most of the discussion has been about the desirability of files with >>>> groups and hierarchies. Again, unless I am missing something, there >>>> already are data providers who are distributing data using groups and >>>> hierarchies, including at least one very large data provider, and they >>>> obviously feel that there is a benefit to such structures. I am not >>>> arguing whether they are right or wrong, just that is the reality. >>>> >>>> If we start from that premise, then the real questions for discussion are >>>> should there be conventions on how groups and hierarchies are used in >>>> netcdf4 and hdf5 files, so that a user or software provider will know what >>>> to expect, and the second question is if it is deemed desirable to have >>>> such conventions, is CF the proper place for them to be developed. >>>> >>>> My sense it that this is what the original proposers are after. >>>> >>>> -Roy >>>> >>>> >>>> ********************** >>>> "The contents of this message do not reflect any position of the U.S. >>>> Government or NOAA." >>>> ********************** >>>> Roy Mendelssohn >>>> Supervisory Operations Research Analyst >>>> NOAA/NMFS >>>> Environmental Research Division >>>> Southwest Fisheries Science Center >>>> 1352 Lighthouse Avenue >>>> Pacific Grove, CA 93950-2097 >>>> >>>> e-mail: >>>> [email protected] >>>> (Note new e-mail address) >>>> voice: (831)-648-9029 >>>> fax: (831)-648-8440 >>>> www: >>>> http://www.pfeg.noaa.gov/ >>>> >>>> >>>> "Old age and treachery will overcome youth and skill." >>>> "From those who have been given much, much will be expected" >>>> "the arc of the moral universe is long, but it bends toward justice" -MLK >>>> Jr. >>>> >>>> _______________________________________________ >>>> CF-metadata mailing list >>>> >>>> [email protected] >>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata >>> >>> >>> -- >>> Dr. Richard P. Signell (508) 457-2229 >>> USGS, 384 Woods Hole Rd. >>> Woods Hole, MA 02543-1598 >>> >> ********************** >> "The contents of this message do not reflect any position of the U.S. >> Government or NOAA." >> ********************** >> Roy Mendelssohn >> Supervisory Operations Research Analyst >> NOAA/NMFS >> Environmental Research Division >> Southwest Fisheries Science Center >> 1352 Lighthouse Avenue >> Pacific Grove, CA 93950-2097 >> >> e-mail: >> [email protected] >> (Note new e-mail address) >> voice: (831)-648-9029 >> fax: (831)-648-8440 >> www: >> http://www.pfeg.noaa.gov/ >> >> >> "Old age and treachery will overcome youth and skill." >> "From those who have been given much, much will be expected" >> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. >> >> _______________________________________________ >> CF-metadata mailing list >> >> [email protected] >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > ********************** "The contents of this message do not reflect any position of the U.S. Government or NOAA." ********************** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center 1352 Lighthouse Avenue Pacific Grove, CA 93950-2097 e-mail: [email protected] (Note new e-mail address) voice: (831)-648-9029 fax: (831)-648-8440 www: http://www.pfeg.noaa.gov/ "Old age and treachery will overcome youth and skill." "From those who have been given much, much will be expected" "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
