I love the list of classifiers and hope that discussion can continue. Having also tried to come up with a pervasive system for standard names (both in CF and in other contexts) over the years, here are some observations.

Naming Effort: It appears CF standard names were originally Much More about coming up with the right name, and partially partitioning useful characteristics, than about a precise definition. This reflects the original community needs, i think; as community needs for precision have grown, so has attention to the definition. But Jonathan is spot- on: getting a name that reflects both the meaning AND community usage has been the challenge. While it frustrates name proposers, it provides great comfort to users.

Normalization and uniqueness: If I understand the proposal correctly, it calls for tracking all the orthogonal classifiers as possible components of the standard name. ('These independent bits of information could be automatically assembled together to create the "standard name".') Is this any different from a database key construction from multiple independent columns of data? Each unique combination of the n components makes another possible name, and the meaning is encoded into the name itself. Exclusion of a component from the name means all values are accepted in that axis.

Length and Complexity: It will be a Very Long standard name in many cases. No technical limitations, probably, but social reaction to these long names will be poor at best. (And will depend on some particularly clever way to indicate omitted categories when constructing the name.) Of course, more common cases will usually be shorter, but people won't always put in the relevant categories, or won't realize they are relevant. ("Oh, c'mon, everyone knows that *has* to be over water.") Like filling out metadata, detail will be avoided during name creation, for better and for worse.

Unique Identifiers for Resources: I agree with Benno: CF absolutely should have a separate resource identifier on the web for (a) all the existing and historical standard names, and (b) any name you come up in this system. (I am separately engaged in creating and serving identifiers for vocabulary terms, so of course I would feel that way. We just now have a service that can provide this; I just started pursuing its application for/with CF.) As an aside, this proposal may be a case where using opaque codes as the identifier, and the standard name as a label string, offers improved value to users.

Unique Identifiers for Data Set Variable: This was proposed as a solution "to identify with a single standard name, closely related variables that one might want to store in a single array". I discourage using standard names as "the unique names for a data set", because there will always be a category for differentiating variables that isn't available in the standard convention. (primary vs secondary instrument, first/second/third installed sensor, clean/dirty, and on and on). Standard names should be used to describe each variable, not name it.

Defining Similarity: For a variable mapping exercise, we considered what makes one thing the 'same as' something else. The answer is (of course) 'it depends'. The great advantage of this proposed approach is that it 'normalizes' the distinctions into the separate categories, so the user can evaluate the match much more directly for his or her own needs. But be aware that it will move the discussions of similarity and difference into the next layer of semantic detail ("does 'body of water' include underground streams?" and so on).

Central Catalog: If the rules are deterministic, and every category has a controlled vocabulary, you don't need a single list of what names (i..e, combinations of categories) are approved; any possible combination of category terms is legal, right? This is fortunate, as the number of proposed names may indeed grow very large very quickly, and people will often just construct the names without bothering to submit them. You also don't need definitions; the definition is the compilation of all the displayed components in that name. (If it *isn't* the same as the aggregation, then there is by definition another axis of interest that needs to be turned into a category, or you will have 2 standard names that look the same but have different meanings.) So this is really a system for creating a single-label categorization scheme across multiple axes; no catalog is strictly needed for the naming convention to work.

Semantics and Ontologies: WIth this proposal, we are much further into creating classification systems for all concepts relevant to CF names (as opposed to conceptually linking the existing CF concepts, which is slightly different). I think this is inevitably a direction to be taken by someone -- witness the Plasmo work -- but it turns the process into something very much like other knowledge classification efforts in the semantic community. That isn't a pro or a con, just an observation. There are lessons to be learned and tools to be reused from work that has gone before. In that regard, I would love to be informed of existing vocabularies (formal or informal) that exist for each of these categories, particularly the first two. (Can we start a wiki page for this info somewhere?)

In summary, I love this idea in principle, but think we can expect a stately progression toward seeing it in action. It serves a different need and audience than Standard Names, and so perhaps should be considered and developed separately, not necessarily as a replacement for them.

John

--------------
John Graybeal   <mailto:[EMAIL PROTECTED]>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Reply via email to