Charlie & Co, Also, regardless of whether these hierarchical structures are stored in NetCDF4 or flattened NetCDF3, we get a big boost in interoperability when we write datasets with known featureTypes (profile, time series collection, swath, etc), because then workflows that have performed a catalog search and returned dataset endpoints knows what to do. If the dataset endpoints contain ad hoc heirarchies it will be a lot more difficult.
So if we create hierarchical datasets, I hope we create known featureTypes to accompany them (like gridEnsembleStructure). Thanks, Rich On Mon, Sep 16, 2013 at 12:53 PM, Steve Hankin <[email protected]> wrote: > Hi Charlie, > > Great that you have opened the door onto this discussion topic. Total > agreement from my pov that "group-awareness" in CF is an area that is crying > to be explored and solved. Your analysis of technical details -- e.g. > attribute scope and inheritance by group descendents, etc. -- sounds natural > and sensible. > > The principle barrier to moving forward along this path lies in the fact > that CF is heavily committed to interoperability. Arguably interoperability > is the raison d'ĂȘtre of CF. The style of "backwards compatibility" that > one gets through a headlong switch from the netCDF3 API into the > group-oriented elements of the netCDF4 API is the most extreme sort of 1-way > trap door. It leads to next generation files that are utterly inaccessible > to previous generation applications. This style of advancement, which > heavily degrades interoperability from a community-wide perspective, should > only be undertaken IMHO if all reasonable alternatives are exhausted. I > welcome discussion on this "philosophical" point. > > So are there reasonable alternatives to these negative impacts on > interoperabilty? It is common practice to flatten groups by dot-appending > the name hierarchy: group.subgroup.child. One could certainly envision > utilities (in the style of your nco) that could convert a netCDF4-CF file > into a netCDf-3 CF file. Could such a translation layer made available as a > Web service? Not the worst answer .... A question I'd like to see > discussed (primarily to Unidata, I guess): how difficult would it be to make > accommodations in the netCDF API, itself, so that netCDF4 groups were > accessible through the netCDF-3 API. If such enhancements could be baked > into the netCDF code the character of the interoperability impacts through > adding group-aware elements to CF would be utterly changed. This would open > the door wide to group-aware CF. Has an analysis of this been done? > > - Steve > > ============================================== > > > On 9/15/2013 6:53 PM, Charlie Zender wrote: >> >> NASA has recently convened an Earth Science Data System Working Group to >> explore existing conventions for data and products stored in HDF and to >> make recommendations for future developments. The CF Conventions are an >> important element in this work, as many scientists and users are >> interested in data products that comply with CF. Many members of the >> working group are familiar with CF and have been involved in attempts to >> apply the CF Conventions to a variety of Earth Science data products. >> >> We have identified a persistent barrier to NASA's greater adoption of >> CF: the lack of protocols for exploiting software-defined group >> hierarchies for data structures. HDF datasets traditionally collected >> and stewarded by NASA often utilize hierarchical (the "H" in HDF) >> groups. A chief advantage of netCDF4 over netCDF3 is that it supports a >> group API compatible with HDF. Here we outline an approach to >> incorporating groups into CF as a step towards recognizing and, >> eventually, exploiting groups. >> >> Some aspects of CF (especially the netCDF Conventions like _FillValue, >> valid_min) can apply unambiguously to HDF files that use groups, but >> other aspects of CF conventions have room for ambiguity when applied to >> such HDF files. Clarifying that ambiguity is one role of conventions, so >> we would like to start a discussion with the aim of obtaining feedback, >> gathering consensus, and eventually, possibly, embedding >> "group-awareness" into CF. Unidata's white paper on Conventions for >> netCDF4 >> (http://www.unidata.ucar.edu/software/netcdf/papers/nc4_conventions.html) >> began >> the discussion of potential "group-aware" CF capabilities. Some previous >> discussion of "group-aware" CF metadata is contained or referenced in >> CF-Metadata Trac tickets 79 (Handling and formatting of vector >> quantities in CF) and 90 (Collection of CF enhancements for >> interoperable applications) yet the "big discussion" on how/whether CF >> should exploit the hierarchical group capabilities of netCDF4 is >> unfinished. Below we propose a standard scheme for interpreting metadata >> scope in hierarchical (group) files, and suggest one or two new Group >> Attributes which we could turn into concrete proposals if interest >> warrants. >> >> Perhaps the most obvious place to start a discussion on making CF >> "group-aware" is the notion of attribute scope: How ought metadata in >> one group apply, if at all, to other groups? CF metadata attributes may >> be applied at the group level (netCDF4 allows this) yet what should that >> mean? Whereas the current CF Convention speaks only of Global Attributes >> and Variable Attributes, a "group-aware" CF must explicitly define the >> properties of a third category of attributes, Group Attributes. Global >> Attributes are a special case of Group Attributes and should share their >> properties. >> >> The key technical definition we propose is that Group Attributes shall >> apply to the group where they are defined and to its descendents, but >> not to that group's ancestors or siblings. Group Attributes apply to all >> a group's descendents recursively with an exception: Any group may >> redefine an attribute defined in an ancestor group, and that >> child-group's definition applies to all its descendents. Thus in cases >> where multiple ancestor groups define the same attribute, attribute >> values are inherited from the nearest ancestor. Note that these are the >> same scoping properties as netCDF4 dimensions. >> >> Our understanding is that this proposal is consistent and >> backwards-compatible with CF. However, it would extend the current usage >> of CF to files with arbitrary hierarchies of groups. Moreover, it might >> be helpful to specifically disallow (or mark as having undefined >> consequences) the use of Group Attributes to store metadata that should >> always be attached directly to variables. Group Attributes such as >> _FillValue, scale_factor, valid_min, might sometimes seem tempting yet >> might create more problems than they would solve. Some attributes (e.g., >> Convention) may be useful only as Global Attributes, and not as Group >> Attributes for other groups. >> >> What would a "group-aware" CF Convention mean in practice? It is >> important to preserve CF backwards compatibility. The metadata >> annotation of flat files (e.g., all netCDF3 files) need not be affected >> by any "group-aware" CF Convention extensions. >> >> Files with group hierarchies would continue to have Global Attributes >> (i.e., Group Attributes at the root group level). Global Attributes are >> almost always useful because they apply to the entire file except where >> superceded by an attribute of the same name at a lower level. Where >> group-oriented attribute conventions would help, we believe, is in >> extending the power of CF unambiguously to nested groups. >> >> Imagine a group file in which each top level group holds model results >> from a distinct CMIP5 simulation (CCSM, ECMWF, GISS, etc.). Or where >> each top level group holds a different satellite-retrieved value of the >> same field (ERBE OLR, CERES OLR, etc.), or a different channel from the >> same multi-spectral radiometer. It may be helpful to know the relation >> of groups to other groups, so that users and tools can learn which are >> (or aren't) intercomparable or aggregable. Properties of ensembles >> stored as groups that would be helpful to know, in an automated way, by >> analysis tools (such as NCO) include: Which groups contain the other >> ensemble realizations? Which groups hold other channels of a >> multi-spectral instrument? Knowing this information would help users and >> analysis tools infer how best to create ensemble statistics, and could >> significantly reduce the overall number of files confronting users. >> >> Finally, groups allow containerization of information which can be >> useful in avoiding repetition. Some would like to define metadata-only >> groups that could then be logically attached to apply to some or all >> other groups in a file. Is it desirable for CF to define a standard way >> to indicate this? >> >> As the previous examples illustrate, there are at least two levels to a >> discussion about "group-aware" CF. The first is scope, i.e., how >> attribute meanings are inherited in hierarchies. The second is the more >> pragmatic issue of what new CF attributes would allow us to exploit >> group hierarchies in a systematic way. We proposed an answer to the >> scope issue to kickstart the discussion. We illustrated how a new >> attribute (call it "ensemble" for now) might be useful. At this stage we >> wish to learn whether CF users/developers are interested in pursuing >> "group-aware" CF extensions at all before we develop more >> details/wording for specific conventions. Perhaps there are others >> working on similar issues, or perhaps the CF maintainers prefer to >> receive specific wording of proposals rather than more diffuse >> "invitations to discuss" like this. If you have an opinion, then please >> let us know. >> >> Until the CF (or some other) Convention tackles the issues of scoping >> and Group Attributes, such annotations will be ad hoc. Our goal is to >> increase interoperability, and we are eager to hear responses from the >> CF community on the direction of "group-aware" extensions to CF. >> >> On behalf of the NASA ESDS HDF5 WG, >> Charlie Zender, Ted Habermann, and Peter Leonard > > > _______________________________________________ > CF-metadata mailing list > [email protected] > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata -- Dr. Richard P. Signell (508) 457-2229 USGS, 384 Woods Hole Rd. Woods Hole, MA 02543-1598 _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
