Dear John > We'd like to come up with a clear statement of what standard names are (or > should be), and what are the problems and issues that we should be focusing > on next.
Thanks for this posting. We've had several discussions about what standard names are for and how they are constructed, and I've found those discussions helpful to clarify ideas. This is what I currently think (partly repeating bits of recent postings): * Standard names are not really "names". They are very brief definitions of the quantities concerned, answering the question, "What does that mean?". Therefore they are often longer than the terms used in scientific literature. * Standard names are an important element of the purpose of CF, to "define metadata that provide a definitive description of what the data in each variable represents ... This enables users of data from different sources to decide which quantities are comparable". Therefore standard names distinguish quantities which need to be distinguished, but they can also be deliberately vague when quantities from a different data source should share a standard name because they are regarded as comparable. Standard names thus have various degrees of precision, and the choice of which set of standard names to use depends on the application. * Like CF in general, the standard name table was initially intended for climate and forecast model output, for describing properties of the simulated world. The same standard names obviously apply for the same quantities measured or inferred in the real world. However for measurements we also have to describe (a) "raw" data, which comes from instruments and is used to produce data about the real world, and (b) properties of the measurement system. We have added some standard names for these purposes, but we may need a clearer policy for doing it. * It takes time and effort to devise new standard names. Proposals which are analogous to existing names can often be agreed quickly. The hard work comes in deciding how to describe new concepts in a way which is clear and consistent with existing names. This work requires scientific understanding of the concepts being described, and thus depends on relevant expertise. In order to make this go faster, it might help to have better tools for analysing the existing names. * We attempt to construct standard names systematically, using words and phrases with consistent meanings and in a consistent order. This is to avoid implying illusory distinctions, and to reduce mistakes which would be made if names differed unexpectedly. Some of the rules are written down in the guidelines, but these are not comprehensive. It would probably help speed up development if we did state all the rules explicitly. That would make it more obvious when a new proposal is like existing ones, and when we have to decide on new patterns or vocabulary. * The guidelines are not followed in all cases, because for some standard names we have adopted familiar but unsystematic terms. Also, there is often more than one possible systematic description of a quantity, but obviously only one can be chosen for the table. * We try to use familiar words and phrases when choosing standard names, but it is more important for them to be self-explanatory and to avoid jargon, the target audience being any scientific user of the data. The names should at least indicate to any such user which general area they refer to. * Quantities which have different physical dimensions (different SI units) are always regarded as distinct, and must have different standard names. Units must be consistent with the standard name; we do not use units to distinguish between quantities. * Standard names do not provide metadata which could have infinitely many possible values. In particular, spatiotemporal coordinates and numerical parameters are specified by coordinate variables, not as part of the standard name. That means there is not a standard name for 2 m air temperature, for example, since CF regards "2 m" as a coordinate. However, surfaces which are identified by a physical description rather than a parameter value (e.g. toa) are included in standard names, because there is only a small set of possibilities. * We could use string-valued coordinates for parts of standard names that could be regarded as parameters with a discrete set of values, like chemical species. We haven't decided to do that yet, but it's a possibility. In that approach any combination of parameter and standard name would be allowed, whereas when the parameter is part of the standard name (as is the case at present with chemical species) the legal combinations are defined explicitly by the standard name table. The latter makes more work in constructing the standard name table, but avoids nonsensical metadata. * Not all the descriptive part of the metadata is included in the standard name. Other attributes are also important, such as cell_methods. A separate attribute is useful to contain metadata that is relevant for a wide range of quantities, because in that case "factorising" it out of the standard name leads to a large reduction in the size of the standard name table. * Common concepts have been proposed as a way to identify particular combinations of standard names with other metadata. They would complement standard names, other attributes and coordinates. Best wishes Jonathan _______________________________________________ CF-metadata mailing list [email protected] http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
