All, the changes described below were just submitted as a PR to Daniel's tree, 
which hopefully means they will modify #145 once accepted. I'm non-expert at 
Github, so perhaps that is inefficient. The original is viewable in my tree at 
https://github.com/czender/cf-conventions/tree/groups

> I'm responding to your comments inline and abridging as I determine relevant; 
> if there's anything you're missing, let me know.
> 
>> ... I'm not convinced about the lateral search. You don't sound convinced 
>> about it either, in saying that it's allowed for backward-compatibility and 
>> may be deprecated in future. Why not deprecate it now (meaning that the 
>> CF-checker would give a warning)? Why allow it at all? Is there a more 
>> specific extra search rule you could provide instead of your general lateral 
>> algorithm to deal with the existing datasets you have in mind?
> 
> I have to agree with you on this, and this is in line with my understanding 
> of the discussion at the meeting in Reading. @czender could you say a few 
> words on this?

I must use many words. I wish I had been at the Reading conference to say this 
in person:

I may be the only defender of the lateral search feature so it falls on me to 
make the case for it. The one word that best summarizes why I think lateral 
search would be good for CF is "User-base". Geoscience researchers use an 
enormous amount of satellite-retrieved data stored with groups by providers 
notably including NASA and ESA. These agencies use HDF5EOS or netCDF4 to store 
data from dozens of platforms/instruments (Aura, MLS, OMI, S5P, TES, etc) and 
quite often data fields are in sibling (not ancestor) groups to the 
coordinates. For example, S5P has geophysical variables in 
`/BAND1_RADIANCE/STANDARD_MODE/OBSERVATIONS` and their coordinates (latitude 
and longitude) in the sibling group `/BAND1_RADIANCE/STANDARD_MODE/GEODATA`. 
MLS on Aura has geophysical variables in `/HDFEOS/SWATHS/O3NadirSwath/Data\ 
Fields` and latitude and longitude in `/HDFEOS/SWATHS/O3NadirSwath/Geolocation\ 
Fields`. The list of other examples exceeds my energy to type it.

This state of affairs developed over many years, and indicates that dataset 
producers like having coordinates in sibling groups of data fields. I doubt 
that NASA/ESA/etc. will start putting coordinates in ancestor groups rather 
than sibling groups just because CF recommends it, and even it they do, there 
will remain a decades-high mountain of data with the current organization. 
If/once CF adopts this groups proposal, one of three things is likely to 
happen: 
1. If CF supports lateral search then these agencies can comply with CF and 
continue storing data with the same organization as before, and use lateral 
search to find out-of-group (OOG) coordinates, should they prefer lateral 
search over relative/absolute paths. Relative/absolute paths are, IMHO, fragile 
and susceptible to breakage in downstream processing so avoiding them is best 
for long-term dataset interoperability.
2. If CF DOES NOT support lateral search then these agencies can comply with CF 
and continue storing data with the same organization as before only by using 
relative/absolute paths to refer to OOG coordinates. This is fragile and IMHO a 
mistaken approach for long-term dataset interoperability.
3. NASA/ESA/etc will start to design future datasets so that OOG coordinates 
are not in sibling groups. Existing data will not be CF-compatible unless it 
already uses relative/absolute paths.
In my opinion it is best for users, data producers, and CF if some combination 
of (1) and (3) occurs, so that the sibling-oriented organization of existing 
datasets is "grandfathered in" to being CF-compliant by allowing lateral 
search, and producers start to deprecate the necessity of lateral search for 
future data products. Since many dataset producers have preferred lateral 
(rather than purely ancestral) dataset organization over the years and lateral 
search is the appropriate mechanism to resolve lateral associations, then CF 
should respect and support past dataset organization decisions and not impose 
relative/absolute paths as a requirement for earning CF-conformance for such 
datasets.

>> You haven't defined what you mean by ancestor group, sibling group, 
>> descendant group, identifier and element. Why not say "nearest" instead of 
>> "most proximal"? What does "nearest" mean in the description of "local apex 
>> group"?
> 
> I think it would be a good idea to add these to the glossary. @czender, do 
> you have an opinion on "nearest" vs. "most proximal"?

To me "nearest" and "most proximal" mean the same thing. "Nearest" seems more 
vernacular, perhaps too vernacular? Nevertheless, I changed to "nearest" in the 
current PR and added to the glossary.

>> Are "object" and "element" the same? If so, I'd use only one of these terms, 
>> and "element" seems better to me because it's not language-specific. Say 
>> what sort of thing can be an "element" - a dimension? a variable? of any 
>> role?

See immediately below.

> @czender I see no difference here and propose adopting "element". Please 
> correct me if I'm overlooking something.

I think there is no difference in the way we use "element" and "object" in the 
text for Groups proposal. However, "element" is already frequently used in CF 
to refer to elements of an array, definitely not the meaning that Groups 
intends to convey. In practice "objects" as used in the Groups proposal can 
only be variables (including coordinate variables, of course), and dimensions. 
In theory a group could itself be an "object", however the proposal does not at 
this point need to do that. Off-hand, I can't think of an instance where an 
out-of-group (OOG) attribute needs to be explicitly referred to. Attributes of 
OOG variables can be important, yet they are always referred to via the OOG 
variable, and there is never a direct reference to an OOG attribute. It's the 
variable's scope that matters, an attribute is always locally attached to a 
variable or group. Are there any counter-examples to this?

If not, I think it is clearer and more precise to eliminate the use of both 
"element" and "object" in the Groups proposal, and replace those words with 
what they actually stand for, i.e., "variable" and/or "dimension". I modified 
the PR to do that. Similarly, I eliminated "identifier". We could instead have 
defined "identifier" to mean "a variable or dimension". That would be possible. 
Nevertheless, I think it's clearer to just say "variable or dimension" 
everywhere in the text. A group could also be considered an "identifier", 
thought this proposal does not need to do so. 

> Next steps
> We'll update the PR to address the points as noted above and ping again when 
> that's been done. In addition to the items noted explicitly, I propose 
> placing the following new terms in the glossary:
> 
> Element (or object)
> Identifier
> Location
> Resolves to
> Nearest dimension
> Ancestor, sibling, descendant group

The updated (by CZ) PR eliminates "element", "object", "identifier", and 
"Resolves to". The updated glossary now defines "Nearest dimension", 
"location", and Ancestor, sibling, and descendant groups.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/144#issuecomment-425252364

Reply via email to