On Sep 18, 2013, at 12:32 PM, Steve Hankin wrote: > > > On 9/18/2013 7:56 AM, Roy Mendelssohn - NOAA Federal wrote: >> Hi All: >> >> NASA has used hierarchies for years, and appears committed to them. So, >> either it is done in an ad hoc way, or through a standard. That doesn't >> mean CF is the place for the standard, just that it would be nice to have >> one. >> > > Roy, > > Lets explore the avenue you have opened here: "that doesn't mean CF is the > place for the standard". The need for hierarchies as tools for programming > is indisputable. But will hierarchical groups advance the interoperability > objectives of CF?
Steve, Speaking for myself, I use groups in data files to organize the various datasets so that a person looking at the file via the commandline (h5dump, ncdump) or application (HDFView, Panoply) can find the dataset they're interested in easily. For instance, in our swath-level (L2) data, we have a number of datasets that aren't really that relevant to our end users, but could come in handy when diagnosing a problem with the algorithm or to monitor algorithm performance. So these diagnostic datasets don't clutter up the output, we've put them into a separate group from the main datasets. So, in this case, do the groups make the files more interoperable? Not really, if we're talking about a completely software-driven system. But this *does* make them more user-friendly, and we'd definitely like to maximize our compatibility as well with those software-driven processes. Why not have the best of both worlds? Hence, I'm fully supporting CF incorporate groups into the conventions. I think Charlie's proposal is an excellent starting point. Cheers, -Corey > At the start of this discussion I had assumed that there would be compelling > examples that supported the introduction of hierarchies to CF. Thus far all > that have been put on display seem to be counter-examples(*): > • For CMIP5 any given hierarchy is an arbitrary, brittle > representation. The CMIP5 collection is better modeled by facets (metadata > tags) than by hierarchies. > • The suitcase analogy serves best to illustrate the problems that > hierarchies can bring -- to locate the black socks in a suitcase usually > involves rummaging the entire suitcase. > • ==> Which speaks to Rich's valid concern that the > data-discovery-to-data-access transition may be very negatively impacted if > hierarchies are not used carefully. > • NASA hierarchies that are 10 levels deep strike me as by definition > an "insider" view of a data collection. These hierarchies may add clarity > for the specific satellite program communicating with its designated science > groups, but they are likely a barrier to an outsider wanting to utilize the > data. > To proceed forward we need to see some compelling use cases that will help us > to understand the costs and benefits? > > - Steve > > (*) with the exception of Feature Collections types already contained in CF > > ================================================= > >> I would point out that every major modern programming language has >> structures, which are essentially hierarchies. Matlab was criticized for >> years about not having structures, and finally added them a few years back. >> R has them, C has them, Python has them, even modern Fortran has them. So >> clearly there must be situations where hierarchies make sense, and are more >> efficient than having everything flat. There are clearly situations where >> flattening everything makes sense. >> >> My $0.02. >> >> -Roy >> >> >> >> On Sep 18, 2013, at 4:52 AM, "Signell, Richard" >> <[email protected]> >> wrote: >> >> >>> All, >>> >>> I'm glad we are discussing this topic, but the fact that large data >>> providers are already distributing data using groups and hierarchies >>> is not a compelling reason to endorse this practice through CF. After >>> all, a lot of data providers are currently distributing scientific >>> data in any number of forms, and the point of CF (along with OGC >>> standards) is to help clean up the mess! >>> >>> I agree that groups make sense for metadata and for certain types of >>> datasets. For example, the discrete sampling geometry featureTypes >>> like profile collection would be easier to understand and deal with as >>> a netcdf4 group of profiles rather than as a netcdf3 ragged array. >>> But the choice was made for CF 1.6 that backward compatibility was >>> more important. >>> >>> I don't think it's cowardly to belive that the more folks use groups >>> to organize their data in an ad hoc way (the suitcase analogy), the >>> more it will hinder the remarkable progress that has been made >>> recently on finding and utilizing distributed CF data via the catalog >>> services (e.g. the geonetwork, gi-cat, geoportal, CKAN instances) that >>> many governments are setting up. When we open the data service >>> endpoints that our query returns, we need to have known data >>> structures, and that's what the CF featureTypes provide. >>> >>> To return to the suitcase/clothing analogy again, we are rapidly >>> gaining the capability via good metadata and catalog services to find >>> all the black socks owned by Jim and Martin that have been washed in >>> the last week. But if our catalog query returns fourteen of Jim's >>> suitcases and twelve of Martin's, then we have more work to do. >>> Unlike socks, luckily we don't need actual suitcases to organize data, >>> we can construct collections on the fly using whatever attributes we >>> desire. >>> >>> I would hope that our job as the CF community would be to identify >>> compelling additional specific featureTypes that we should support. >>> And if these identified featureTypes demand groups for efficiency or >>> some other reason, well, let's have that discussion. >>> >>> -Rich >>> >>> On Wed, Sep 18, 2013 at 12:08 AM, Roy Mendelssohn - NOAA Federal >>> >>> <[email protected]> >>> wrote: >>> >>>> Hi All: >>>> >>>> I am old and slow, and I must be missing something, because at this point >>>> most of the discussion has been about the desirability of files with >>>> groups and hierarchies. Again, unless I am missing something, there >>>> already are data providers who are distributing data using groups and >>>> hierarchies, including at least one very large data provider, and they >>>> obviously feel that there is a benefit to such structures. I am not >>>> arguing whether they are right or wrong, just that is the reality. >>>> >>>> If we start from that premise, then the real questions for discussion are >>>> should there be conventions on how groups and hierarchies are used in >>>> netcdf4 and hdf5 files, so that a user or software provider will know what >>>> to expect, and the second question is if it is deemed desirable to have >>>> such conventions, is CF the proper place for them to be developed. >>>> >>>> My sense it that this is what the original proposers are after. >>>> >>>> -Roy >>>> >>>> >>>> ********************** >>>> "The contents of this message do not reflect any position of the U.S. >>>> Government or NOAA." >>>> ********************** >>>> Roy Mendelssohn >>>> Supervisory Operations Research Analyst >>>> NOAA/NMFS >>>> Environmental Research Division >>>> Southwest Fisheries Science Center >>>> 1352 Lighthouse Avenue >>>> Pacific Grove, CA 93950-2097 >>>> >>>> e-mail: >>>> [email protected] >>>> (Note new e-mail address) >>>> voice: (831)-648-9029 >>>> fax: (831)-648-8440 >>>> www: >>>> http://www.pfeg.noaa.gov/ >>>> >>>> >>>> "Old age and treachery will overcome youth and skill." >>>> "From those who have been given much, much will be expected" >>>> "the arc of the moral universe is long, but it bends toward justice" -MLK >>>> Jr. >>>> >>>> _______________________________________________ >>>> CF-metadata mailing list >>>> >>>> [email protected] >>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata >>> >>> >>> -- >>> Dr. Richard P. Signell (508) 457-2229 >>> USGS, 384 Woods Hole Rd. >>> Woods Hole, MA 02543-1598 >>> >> ********************** >> "The contents of this message do not reflect any position of the U.S. >> Government or NOAA." >> ********************** >> Roy Mendelssohn >> Supervisory Operations Research Analyst >> NOAA/NMFS >> Environmental Research Division >> Southwest Fisheries Science Center >> 1352 Lighthouse Avenue >> Pacific Grove, CA 93950-2097 >> >> e-mail: >> [email protected] >> (Note new e-mail address) >> voice: (831)-648-9029 >> fax: (831)-648-8440 >> www: >> http://www.pfeg.noaa.gov/ >> >> >> "Old age and treachery will overcome youth and skill." >> "From those who have been given much, much will be expected" >> "the arc of the moral universe is long, but it bends toward justice" -MLK Jr. >> >> _______________________________________________ >> CF-metadata mailing list >> >> [email protected] >> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > > _______________________________________________ > CF-metadata mailing list > [email protected] > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata -- Corey Bettenhausen Science Systems and Applications, Inc NASA Goddard Space Flight Center 301 614 5383 [email protected] _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
