This message came from the CF Trac system.  Do not reply.  Instead, enter your 
comments in the CF Trac system at http://kitt.llnl.gov/trac/.

#99: Taxon Names and Identifiers
-----------------------------+------------------------------
  Reporter:  lowry           |      Owner:  cf-conventions@…
      Type:  enhancement     |     Status:  new
  Priority:  high            |  Milestone:
 Component:  cf-conventions  |    Version:
Resolution:                  |   Keywords:
-----------------------------+------------------------------
\
\
\
\
\
\

Comment (by graybeal):

 I see this ticket, on Taxon Names and Identifiers, has not been addressed
 since the original discussion over a year ago.

 I think it is most important that the ticket move forward. Though Roy's
 team may have moved on, this problem will need to be addressed in CF
 sooner or later. While only Roy, Jonathan, and I have discussed it, I
 suspect many CF lurkers have need for this capability.

 The following issues seem acceptably resolved:
 - promoting 6.1.1 on "Geographic regions" to 6.3 (i.e. remove it from
 6.1), and adding Roy's as 6.4. Then 6.1 and 6.2 will describe mechanisms
 in CF, and 6.3 and 6.4 applications of these mechanisms.
 - Initial text rewording by Jonathan: "A taxon is a named level within a
 biological classification, such as a class, genus and species. Quantities
 dependent on taxa have generic standard_names containing the word taxon,
 and the taxa are identified by auxiliary coordinate variables."
 - Requiring name and identifier is reasonable (to make the description
 self-contained).

 The following questions are open:
 - How many identifier/sources if multiple are available? Roy suggested 1,
 Jonathan recommends 2, John suggests user's choice.
 - How many sources? Roy suggested 2 (extensible), John says CF should not
 limit (and if it does, the 2 suggested are not the best 2).
 - What kind of identifier? Roy suggested namespace + ':' + local text ID;
 Jonathan proposed (agreeable to Roy) separate int variables for WORMS
 aphia ID vs ITIS taxon species name; and  John prefers globally unique
 identifiers, LSIDs being the common practice (not offered directly by
 ITIS, only indirectly through Catalog of Life). In Jonathan's scheme each
 ID type would have a separate int variable, dimensioned to the number of
 taxa being defined.

 (Incidentally, http://www.jbiomedsem.com/content/2/1/7 provides a detailed
 analysis of the Catalog of Life identifier approach, which integrates the
 data from ITIS, WORMS, and Species 2000, among many others, and includes
 thoughts of why the CoL approach wasn't more widely adopted (at that time
 anyway). Another extended discussion at
 http://soyouthinkyoucandigitize.wordpress.com/2013/01/28/what-gets-linked-
 to-global-unique-identifiers-guids-in-natural-history-collection-
 digitization/. The point is that while going round and round is definitely
 possible, I want to cleanly account for more than what a specific part of
 the CF community does today, if we can.)

 Looking for a common path, the following seems pretty close:
 - Support multiple identifier sources; specifying those to be provided _if
 available_
   - if it isn't available in ITIS or WORMS, it should still be citable
   - if the user always uses WORMS, we should not force them to translate
 to ITIS, and vice versa
   - While I happen to think Catalog of Life is more suitable than ITIS,
 I'll forego the argument as long as we aren't exclusive
 - Use Jonathan's proposed approach for WORMS and ITIS, but allow the
 extension for others (e.g., CoL) for other globally unique identifiers;
 with any globally unique identifier to be given the standard name
 taxon_global_identifier, and can be text (which most will be) or int (for
 UUIDs, for example)
   - The comparability of identifiers A to B to C etc.  will inevitably be
 done at a domain-specific application level, well beyond the concern of CF
 (but readily achievable by domain experts)
   - It won't be necessary to define unique identifier types for each
 source, since globally unique identifiers are by their nature
 distinguishable and uniquely relatable to their source
   - If we accept this adjustment, we don't have to argue on the merits
 whether Catalog of Life is better than ITIS (not so much because of LSIDs,
 but because it includes many more sources than just ITIS).

 So this might give us the following example:
 {{{
 variables:
   int aphiaID(taxa);
     aphiaID:_FillValue=-1;
     aphiaID:standard_name="taxon_identifier";
   int tsn(taxa);
     tsn:_FillValue=0;
     tsn:standard_name="taxonomic_serial_number";
   string col(taxa);
     col:_FillValue="null";
     col:standard_name="taxon_global_identifier";
     col:comment="LSID from Catalog of Life";
 data:
   taxon_name="Homo sapiens", "Fraxinus excelsior", "Struthio camelus";
   aphiaID=1,32768,-1;
   tsn=42,0,7776;
   col="urn:lsid:catalogueoflife.org:taxon:f33e0fe1-ac8e-
 11e3-805d-020044200006:col20140401","urn:lsid:catalogueoflife.org:taxon
 :0ad7462a-ac8f-
 
11e3-805d-020044200006:col20140401","urn:lsid:catalogueoflife.org:taxon:ebff2886
 -ac8e-11e3-805d-020044200006:col20140401";
 }}}
\
\
\

-- 
Ticket URL: <http://kitt.llnl.gov/trac/ticket/99#comment:9>
CF Metadata <http://cf-convention.github.io/>
CF Metadata
This message came from the CF Trac system.  To unsubscribe, without 
unsubscribing to the regular cf-metadata list, send a message to 
"[email protected]" with "unsubscribe cf-metadata" in the body of your 
message.

Reply via email to