All, the changes described below were just submitted as a PR to Daniel's tree, which hopefully means they will modify #145 once accepted. I'm non-expert at Github, so perhaps that is inefficient. The original is viewable in my tree at https://github.com/czender/cf-conventions/tree/groups
> I'm responding to your comments inline and abridging as I determine relevant; > if there's anything you're missing, let me know. > >> ... I'm not convinced about the lateral search. You don't sound convinced >> about it either, in saying that it's allowed for backward-compatibility and >> may be deprecated in future. Why not deprecate it now (meaning that the >> CF-checker would give a warning)? Why allow it at all? Is there a more >> specific extra search rule you could provide instead of your general lateral >> algorithm to deal with the existing datasets you have in mind? > > I have to agree with you on this, and this is in line with my understanding > of the discussion at the meeting in Reading. @czender could you say a few > words on this? I must use many words. I wish I had been at the Reading conference to say this in person: I may be the only defender of the lateral search feature so it falls on me to make the case for it. The one word that best summarizes why I think lateral search would be good for CF is "User-base". Geoscience researchers use an enormous amount of satellite-retrieved data stored with groups by providers notably including NASA and ESA. These agencies use HDF5EOS or netCDF4 to store data from dozens of platforms/instruments (Aura, MLS, OMI, S5P, TES, etc) and quite often data fields are in sibling (not ancestor) groups to the coordinates. For example, S5P has geophysical variables in `/BAND1_RADIANCE/STANDARD_MODE/OBSERVATIONS` and their coordinates (latitude and longitude) in the sibling group `/BAND1_RADIANCE/STANDARD_MODE/GEODATA`. MLS on Aura has geophysical variables in `/HDFEOS/SWATHS/O3NadirSwath/Data\ Fields` and latitude and longitude in `/HDFEOS/SWATHS/O3NadirSwath/Geolocation\ Fields`. The list of other examples exceeds my energy to type it. This state of affairs developed over many years, and indicates that dataset producers like having coordinates in sibling groups of data fields. I doubt that NASA/ESA/etc. will start putting coordinates in ancestor groups rather than sibling groups just because CF recommends it, and even it they do, there will remain a decades-high mountain of data with the current organization. If/once CF adopts this groups proposal, one of three things is likely to happen: 1. If CF supports lateral search then these agencies can comply with CF and continue storing data with the same organization as before, and use lateral search to find out-of-group (OOG) coordinates, should they prefer lateral search over relative/absolute paths. Relative/absolute paths are, IMHO, fragile and susceptible to breakage in downstream processing so avoiding them is best for long-term dataset interoperability. 2. If CF DOES NOT support lateral search then these agencies can comply with CF and continue storing data with the same organization as before only by using relative/absolute paths to refer to OOG coordinates. This is fragile and IMHO a mistaken approach for long-term dataset interoperability. 3. NASA/ESA/etc will start to design future datasets so that OOG coordinates are not in sibling groups. Existing data will not be CF-compatible unless it already uses relative/absolute paths. In my opinion it is best for users, data producers, and CF if some combination of (1) and (3) occurs, so that the sibling-oriented organization of existing datasets is "grandfathered in" to being CF-compliant by allowing lateral search, and producers start to deprecate the necessity of lateral search for future data products. Since many dataset producers have preferred lateral (rather than purely ancestral) dataset organization over the years and lateral search is the appropriate mechanism to resolve lateral associations, then CF should respect and support past dataset organization decisions and not impose relative/absolute paths as a requirement for earning CF-conformance for such datasets. >> You haven't defined what you mean by ancestor group, sibling group, >> descendant group, identifier and element. Why not say "nearest" instead of >> "most proximal"? What does "nearest" mean in the description of "local apex >> group"? > > I think it would be a good idea to add these to the glossary. @czender, do > you have an opinion on "nearest" vs. "most proximal"? To me "nearest" and "most proximal" mean the same thing. "Nearest" seems more vernacular, perhaps too vernacular? Nevertheless, I changed to "nearest" in the current PR and added to the glossary. >> Are "object" and "element" the same? If so, I'd use only one of these terms, >> and "element" seems better to me because it's not language-specific. Say >> what sort of thing can be an "element" - a dimension? a variable? of any >> role? See immediately below. > @czender I see no difference here and propose adopting "element". Please > correct me if I'm overlooking something. I think there is no difference in the way we use "element" and "object" in the text for Groups proposal. However, "element" is already frequently used in CF to refer to elements of an array, definitely not the meaning that Groups intends to convey. In practice "objects" as used in the Groups proposal can only be variables (including coordinate variables, of course), and dimensions. In theory a group could itself be an "object", however the proposal does not at this point need to do that. Off-hand, I can't think of an instance where an out-of-group (OOG) attribute needs to be explicitly referred to. Attributes of OOG variables can be important, yet they are always referred to via the OOG variable, and there is never a direct reference to an OOG attribute. It's the variable's scope that matters, an attribute is always locally attached to a variable or group. Are there any counter-examples to this? If not, I think it is clearer and more precise to eliminate the use of both "element" and "object" in the Groups proposal, and replace those words with what they actually stand for, i.e., "variable" and/or "dimension". I modified the PR to do that. Similarly, I eliminated "identifier". We could instead have defined "identifier" to mean "a variable or dimension". That would be possible. Nevertheless, I think it's clearer to just say "variable or dimension" everywhere in the text. A group could also be considered an "identifier", thought this proposal does not need to do so. > Next steps > We'll update the PR to address the points as noted above and ping again when > that's been done. In addition to the items noted explicitly, I propose > placing the following new terms in the glossary: > > Element (or object) > Identifier > Location > Resolves to > Nearest dimension > Ancestor, sibling, descendant group The updated (by CZ) PR eliminates "element", "object", "identifier", and "Resolves to". The updated glossary now defines "Nearest dimension", "location", and Ancestor, sibling, and descendant groups. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/144#issuecomment-425252364
