Re: [CF-metadata] a different (but perhaps unoriginal) approach to standard name construction

John Graybeal Tue, 04 Nov 2008 08:55:49 -0800

I love the list of classifiers and hope that discussion can continue.Having also tried to come up with a pervasive system for standardnames (both in CF and in other contexts) over the years, here are someobservations.

Naming Effort: It appears CF standard names were originally Much Moreabout coming up with the right name, and partially partitioning usefulcharacteristics, than about a precise definition. This reflects theoriginal community needs, i think; as community needs for precisionhave grown, so has attention to the definition. But Jonathan is spot-on: getting a name that reflects both the meaning AND community usagehas been the challenge. While it frustrates name proposers, itprovides great comfort to users.

Normalization and uniqueness: If I understand the proposal correctly,it calls for tracking all the orthogonal classifiers as possiblecomponents of the standard name. ('These independent bits ofinformation could be automatically assembled together to create the"standard name".') Is this any different from a database keyconstruction from multiple independent columns of data? Each uniquecombination of the n components makes another possible name, and themeaning is encoded into the name itself. Exclusion of a component fromthe name means all values are accepted in that axis.

Length and Complexity: It will be a Very Long standard name in manycases. No technical limitations, probably, but social reaction tothese long names will be poor at best. (And will depend on someparticularly clever way to indicate omitted categories whenconstructing the name.) Of course, more common cases will usually beshorter, but people won't always put in the relevant categories, orwon't realize they are relevant. ("Oh, c'mon, everyone knows that*has* to be over water.") Like filling out metadata, detail will beavoided during name creation, for better and for worse.

Unique Identifiers for Resources: I agree with Benno: CF absolutelyshould have a separate resource identifier on the web for (a) all theexisting and historical standard names, and (b) any name you come upin this system. (I am separately engaged in creating and servingidentifiers for vocabulary terms, so of course I would feel that way.We just now have a service that can provide this; I just startedpursuing its application for/with CF.) As an aside, this proposal maybe a case where using opaque codes as the identifier, and the standardname as a label string, offers improved value to users.

Unique Identifiers for Data Set Variable: This was proposed as asolution "to identify with a single standard name, closely relatedvariables that one might want to store in a single array". Idiscourage using standard names as "the unique names for a data set",because there will always be a category for differentiating variablesthat isn't available in the standard convention. (primary vs secondaryinstrument, first/second/third installed sensor, clean/dirty, and onand on). Standard names should be used to describe each variable, notname it.

Defining Similarity: For a variable mapping exercise, we consideredwhat makes one thing the 'same as' something else. The answer is (ofcourse) 'it depends'. The great advantage of this proposed approach isthat it 'normalizes' the distinctions into the separate categories, sothe user can evaluate the match much more directly for his or her ownneeds. But be aware that it will move the discussions of similarityand difference into the next layer of semantic detail ("does 'body ofwater' include underground streams?" and so on).

Central Catalog: If the rules are deterministic, and every categoryhas a controlled vocabulary, you don't need a single list of whatnames (i..e, combinations of categories) are approved; any possiblecombination of category terms is legal, right? This is fortunate, asthe number of proposed names may indeed grow very large very quickly,and people will often just construct the names without bothering tosubmit them. You also don't need definitions; the definition is thecompilation of all the displayed components in that name. (If it*isn't* the same as the aggregation, then there is by definitionanother axis of interest that needs to be turned into a category, oryou will have 2 standard names that look the same but have differentmeanings.) So this is really a system for creating a single-labelcategorization scheme across multiple axes; no catalog is strictlyneeded for the naming convention to work.

Semantics and Ontologies: WIth this proposal, we are much further intocreating classification systems for all concepts relevant to CF names(as opposed to conceptually linking the existing CF concepts, which isslightly different). I think this is inevitably a direction to betaken by someone -- witness the Plasmo work -- but it turns theprocess into something very much like other knowledge classificationefforts in the semantic community. That isn't a pro or a con, just anobservation. There are lessons to be learned and tools to be reusedfrom work that has gone before. In that regard, I would love to beinformed of existing vocabularies (formal or informal) that exist foreach of these categories, particularly the first two. (Can we start awiki page for this info somewhere?)

In summary, I love this idea in principle, but think we can expect astately progression toward seeing it in action. It serves a differentneed and audience than Standard Names, and so perhaps should beconsidered and developed separately, not necessarily as a replacementfor them.


John

--------------
John Graybeal   <mailto:[EMAIL PROTECTED]>  -- 831-775-1956
Monterey Bay Aquarium Research Institute
Marine Metadata Interoperability Project: http://marinemetadata.org

_______________________________________________
CF-metadata mailing list
[email protected]
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

Re: [CF-metadata] a different (but perhaps unoriginal) approach to standard name construction

Reply via email to