This message came from the CF Trac system.  Do not reply.  Instead, enter your 
comments in the CF Trac system at https://cf-pcmdi.llnl.gov/trac/.

#99: Taxon Names and Identifiers
-----------------------------+----------------------------------------------
  Reporter:  lowry           |       Owner:  [email protected]
      Type:  enhancement     |      Status:  new                          
  Priority:  high            |   Milestone:                               
 Component:  cf-conventions  |     Version:                               
Resolution:                  |    Keywords:                               
-----------------------------+----------------------------------------------
Comment (by jonathan):

 Dear John

 I am sure you and Roy know more about the available taxonomic databases.
 If CF isn't going to provide its own, I think we should be explicit about
 which ones should be used, and it should be as few as possible. That is
 because, in the limiting case that every data provider used a different
 taxonomic database, the datasets would no longer be comparable. You
 wouldn't know whether Graybeal species number 94308 was the same as Lowry
 species number 612095, even if they did have the same species name, since
 the names are not regarded as reliable. So I don't think we ought to leave
 it open to the data writer to use any database they deem to be acceptable.

 Ideally we would have only one external authority, but Roy says that is
 not sufficient, and suggests there are two. To maximise portability of
 data, I therefore suggested that it be recommended for ''both'' to be
 used. However, in some cases the species concerned will be in one but not
 the other. That is when there will be missing data in one of the auxiliary
 coordinates. For instance:

 {{{
 variables:
   int aphiaID(taxa);
     aphiaID:_FillValue=-1;
     aphiaID:standard_name="taxon_identifier";
   int tsn(taxa);
     tsn:_FillValue=0;
     tsn:standard_name="taxonomic_serial_number";
 data:
   taxon_name="Homo sapiens", "Fraxinus excelsior", "Struthio camelus";
   aphiaID=1,32768,-1;
   tsn=42,0,7776;
 }}}

 In this entirely made-up example, ''F. excelsior'' appears in WoRMS but
 not ITIS, while ''S. camelus'' is in ITIS but not WoRMS, so there are
 missing data elements in the auxiliary coordinate variables.

 I think if both are provided, as recommended, they should be consistent
 and it is an error if they are not. For example, TSN 42 might actually be
 ''Pan troglodytes'' rather than ''H. sapiens''. This would be an error. If
 we just said, "let WoRMS take precedence", the purpose of providing TSN as
 well would be undermined. If we provide both and they are consistent,
 software with a preference for one of them can use that one. If they are
 not guaranteed to be consistent, you would get different results depending
 on which identifier you use.

 Best wishes

 Jonathan

-- 
Ticket URL: <https://cf-pcmdi.llnl.gov/trac/ticket/99#comment:3>
CF Metadata <http://cf-pcmdi.llnl.gov/>
CF Metadata

This message came from the CF Trac system.  To unsubscribe, without 
unsubscribing to the regular cf-metadata list, send a message to 
"[email protected]" with "unsubscribe cf-metadata" in the body of your 
message.

Reply via email to