Just thinking about this long-standing principle that files be self-describing. 
Section 2.6 of the convention states:

“…a file may also contain non-standard attributes. Such attributes do not 
represent a violation of this standard. Application programs should ignore 
attributes that they do not recognise or which are irrelevant for their 
purposes.”

This suggests that there is nothing stopping me from adding opaque metadata to 
my files (e.g. an EPSG code – this is in fact something that we do for our own 
internal use). However, someone using generic tools to examine the file 
(ncview, ncdump) won’t know which attributes are part of the CF standard and 
which are not. The fact that a subset of the attributes (all the CF attributes 
plus, possibly, some of the non-CF attributes) are self-describing becomes 
irrelevant if some of the (non-CF) attributes are opaque.

It appears that the only way for a user to distinguish between CF and non-CF 
attributes (to work out which are a sufficient subset to interpret the file) is 
to refer to the CF convention and/or any documentation supplied by the data 
provider, or to use software that is aware of the CF standard. I would argue 
that this means that CF-compliant files are not really self-describing given 
the need to reference external resources (standards/documentation or software). 
Even if the file did not contain any non-CF attributes, a user unfamiliar with 
CF would not know this without reference to external resources.

If opaque non-CF metadata are permitted then I’m not sure of the benefit of CF 
requiring the rest of the attributes to be self-describing (however good that 
might be in principle). This implies that either non-CF attributes should be 
prohibited or the principle of self-describing files should be dropped.

Am I missing something? What do others think?

Dan


From: JonathanGregory <[email protected]>
Sent: Thursday, 25 June 2020 09:57
To: cf-convention/cf-conventions <[email protected]>
Cc: Subscribed <[email protected]>
Subject: Re: [cf-convention/cf-conventions] State the principles for design of 
the CF conventions (#273)


Dear Karl

Thanks. I have reformulated principle (1), combining yours and mine, and 
stating the purpose at the start. I think "self-describing" means not using 
anything outside the file itself, which is stronger than what you suggested. Is 
this OK?

In response to your first additional point, I've appended a bit to principle 
(8). Thanks for your second additional point, which is important. I have 
inserted principle (3) about this. Finally, I have added principle (10), which 
is partly a corollary of (9), and partly something we've done for its own sake, 
often advocated by Steve Hankin.

Thus, here is the current proposal:

(1) CF-netCDF metadata is designed to make each dataset self-describing, 
meaning that it should be interpretable without reference to resources outside 
itself. To achieve this purpose, CF-netCDF does not use codes, but instead 
relies on controlled vocabularies containing terms that are chosen as far as 
practically possible to be self-explanatory (and whose precise definitions are 
provided in CF documents).

(2) The conventions are changed only as actually required by common use-cases, 
and not for needs which cannot be anticipated with certainty.

(3) [New] In order to keep them logical, consistent in approach and as simple 
as possible, the netCDF conventions are devised with and within the conceptual 
framework of the CF data model.

(4) The conventions should be practicable for both producers and users of data.

(5) The metadata should be both easily readable by humans and easily parsable 
by programs.

(6) [Slightly reordered] To avoid potential inconsistency within the metadata, 
the conventions should minimise redundancy.

(7) The conventions should minimise the possibility for mistakes by 
data-writers and data-readers.

(8) Conventions are provided to allow data-producers to describe the data they 
wish to produce, rather than attempting to prescribe what data they should 
produce; [new] consequently most CF conventions are optional.

(9) Because many datasets remain in use for a long time after production, it is 
desirable that metadata written according to previous versions of the 
convention should also be compliant with and have the same interpretation under 
later versions.

(10) [New] Because all previous versions must generally continue to be 
supported in software for the sake of archived datasets, and in order to limit 
the complexity of the conventions, there is a strong preference against 
introducing any new capability to the conventions when there is already some 
method that can adequately serve the same purpose (even if a different method 
would arguably be better than the existing one).

Cheers

Jonathan

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on 
GitHub<https://github.com/cf-convention/cf-conventions/issues/273#issuecomment-649396973>,
 or 
unsubscribe<https://github.com/notifications/unsubscribe-auth/ANWNP6RQO3WV4MLG4QLFBQ3RYMGOFANCNFSM4NZQXDKQ>.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/cf-convention/cf-conventions/issues/273#issuecomment-649489991
This list forwards relevant notifications from Github.  It is distinct from 
[email protected], although if you do nothing, a subscription to the 
UCAR list will result in a subscription to this list.
To unsubscribe from this list only, send a message to 
[email protected].

Reply via email to