Just thinking about this long-standing principle that files be self-describing. Section 2.6 of the convention states:
“…a file may also contain non-standard attributes. Such attributes do not represent a violation of this standard. Application programs should ignore attributes that they do not recognise or which are irrelevant for their purposes.” This suggests that there is nothing stopping me from adding opaque metadata to my files (e.g. an EPSG code – this is in fact something that we do for our own internal use). However, someone using generic tools to examine the file (ncview, ncdump) won’t know which attributes are part of the CF standard and which are not. The fact that a subset of the attributes (all the CF attributes plus, possibly, some of the non-CF attributes) are self-describing becomes irrelevant if some of the (non-CF) attributes are opaque. It appears that the only way for a user to distinguish between CF and non-CF attributes (to work out which are a sufficient subset to interpret the file) is to refer to the CF convention and/or any documentation supplied by the data provider, or to use software that is aware of the CF standard. I would argue that this means that CF-compliant files are not really self-describing given the need to reference external resources (standards/documentation or software). Even if the file did not contain any non-CF attributes, a user unfamiliar with CF would not know this without reference to external resources. If opaque non-CF metadata are permitted then I’m not sure of the benefit of CF requiring the rest of the attributes to be self-describing (however good that might be in principle). This implies that either non-CF attributes should be prohibited or the principle of self-describing files should be dropped. Am I missing something? What do others think? Dan From: JonathanGregory <[email protected]> Sent: Thursday, 25 June 2020 09:57 To: cf-convention/cf-conventions <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [cf-convention/cf-conventions] State the principles for design of the CF conventions (#273) Dear Karl Thanks. I have reformulated principle (1), combining yours and mine, and stating the purpose at the start. I think "self-describing" means not using anything outside the file itself, which is stronger than what you suggested. Is this OK? In response to your first additional point, I've appended a bit to principle (8). Thanks for your second additional point, which is important. I have inserted principle (3) about this. Finally, I have added principle (10), which is partly a corollary of (9), and partly something we've done for its own sake, often advocated by Steve Hankin. Thus, here is the current proposal: (1) CF-netCDF metadata is designed to make each dataset self-describing, meaning that it should be interpretable without reference to resources outside itself. To achieve this purpose, CF-netCDF does not use codes, but instead relies on controlled vocabularies containing terms that are chosen as far as practically possible to be self-explanatory (and whose precise definitions are provided in CF documents). (2) The conventions are changed only as actually required by common use-cases, and not for needs which cannot be anticipated with certainty. (3) [New] In order to keep them logical, consistent in approach and as simple as possible, the netCDF conventions are devised with and within the conceptual framework of the CF data model. (4) The conventions should be practicable for both producers and users of data. (5) The metadata should be both easily readable by humans and easily parsable by programs. (6) [Slightly reordered] To avoid potential inconsistency within the metadata, the conventions should minimise redundancy. (7) The conventions should minimise the possibility for mistakes by data-writers and data-readers. (8) Conventions are provided to allow data-producers to describe the data they wish to produce, rather than attempting to prescribe what data they should produce; [new] consequently most CF conventions are optional. (9) Because many datasets remain in use for a long time after production, it is desirable that metadata written according to previous versions of the convention should also be compliant with and have the same interpretation under later versions. (10) [New] Because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one). Cheers Jonathan — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://github.com/cf-convention/cf-conventions/issues/273#issuecomment-649396973>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANWNP6RQO3WV4MLG4QLFBQ3RYMGOFANCNFSM4NZQXDKQ>. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/cf-convention/cf-conventions/issues/273#issuecomment-649489991 This list forwards relevant notifications from Github. It is distinct from [email protected], although if you do nothing, a subscription to the UCAR list will result in a subscription to this list. To unsubscribe from this list only, send a message to [email protected].
