[API-users] Is there a GBIF specific LSID that can be used?

Donat Agosti Wed, 20 Aug 2014 08:07:14 +0000

Hi Markus

"If we had better information about types and original name usages (protonyms, 
basionyms) we could try to assign stable ids to a fixed set of protonyms in the 
GBIF backbone. Does that sound reasonable?"

This stable id for a protonym or more generally treatments of any sorts of 
taxonomic name usages is what Plazi is supplying and minting a stable http URI. 
This is also, what we supply you with the DWC-A for observation records from 
the literature.

The treatment also provides metadata that, among other resolves to the source 
article.

Right now, we make sure that all those article links are DOIs, either a Cross 
Ref DOI or a Data Cite DOI minted through the Biodiversity Literature 
Repository.

The treatment httpURI is also as much as possible linked with URI minted by 
Zoobank, and coordinated with Pensoft journals, and others, if they want to.

This is relevant, since all the protonyms will hopefully will be included in 
Zoobank.

We are working on a RDF representation of the treatments that should be 
available at around TDWG this year.

Since a taxononmic name usage is linked to a particular place in a 
publictation, normally given by the page number complementing the bibliographic 
reference, we do want to provide this resolution. Really, we want to provide 
the content, and thus we extract the treatments and make them for legacy data 
accessible which also allows us to provide you with materialscitations that can 
be linked back to the treatment through its UI, or in prospective publictions 
using taxpub, that has a treatment element (see the Pensoft publications).

So, you could provide a link to the treatments in plazi, which you already have 
for all those TNU we supply.

Donat

From: api-users-bounces at lists.gbif.org 
[mailto:[email protected]] On Behalf Of Markus D?ring
Sent: Tuesday, August 19, 2014 12:38 PM
To: Richard Pyle
Cc: api-users at lists.gbif.org
Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?

Hi Rich, Rod & Rob,

thanks for this interesting taxonomic / GNA discussion. It might be a little 
confusing and boring to GBIF API users, so maybe we continue privately and 
restrict discussions on this list to GBIF API related topics.

The original question raised was if GBIF provides LSIDs or other globally 
unique identifiers for GBIF backbone taxa. As we only have local ids now and 
GBIF will be able to issue DataCite DOIs very soon I wondered if it helps to 
mint DOIs on top of the local ids to make them globally unique. Any thoughts on 
this would be appreciated.

A checklist bank id refers to a "name usage" and is similar to a TNU in GNUB I 
suppose. It identifies a taxon name being used within a certain (taxonomic) 
dataset and can refer to either an accepted taxon or a synonym. Identifiers are 
stable over different versions of the backbone, but the exact classification 
and list of synonyms for an accepted taxon is allowed to change. In the near 
future I would also like to allow the name string to change in case of 
misspellings and other small variations.

For concrete implementations it is quite a challenge to come up with a clear 
definition when a *taxon* identifier should change and when it should remain 
the same. Would users like to see true taxon concept identifiers for the GBIF 
backbone that remain stable as long as GBIF regards the taxon still the same 
whatever scientific name is used as the currently accepted label? If we had 
better information about types and original name usages (protonyms, basionyms) 
we could try to assign stable ids to a fixed set of protonyms in the GBIF 
backbone. Does that sound reasonable?

Cheers,
Markus

On 18 Aug 2014, at 23:20, Richard Pyle <deepreef at 
bishopmuseum.org<mailto:deepreef at bishopmuseum.org>> wrote:

Since Rod opened the can of worms, I'll dig in to it an feast along with the 
others.

Here is what seven years of NOMINA (http://globalnames.org/Nomina) meetings, 
plus millions of conversations at TDWG, Pro-iBiosphere, ICZN, ICB, iDigBio and 
many other regional, national, and international conferences, plus millions of 
dollars of targeted funding from various sources to drive the Global Names 
initiative....has led us to.

First, the biodiversity informatics realm is full of name-strings.  These are 
strings of text characters, usually encoded as UTF-8, purported to represent 
taxon names of organisms.  They may or may not include authorships, and/or 
abbreviations, and/or qualifiers of various sorts.  These are the things that 
are indexed in GNI (http://gni.globalnames.org<http://gni.globalnames.org/>)

I completely agree with Rod that a "taxon name" is much more than just the 
string of UTF-8 characters used to render it.  For clarity of communication (as 
if that were even possible in these kinds of discussions), I refer to these as 
"name objects".  They are conceptual (abstract) constructs, and are uniquely 
represented by a rich suite of metadata (publication metadata in which the name 
was originally established in accordance with a nomenclatural Code, authorship 
metadata, type specimen or type taxon metadata, etc.). A single taxon name 
might be represented via different name-strings (e.g., different alternate 
spellings, different genus combinations, etc.), and a single name-string might 
be applied to different name-objects (homonyms & homographs).

And, again, I completely agree with Rod that a "taxon" (=taxon concept, = 
taxonomic circumscription) is something else - it is another conceptual 
(abstract) construct, typically represented by a broader collection of 
metadata, including things like included child taxa, included synonym taxa, 
biological characters, and possibly other stuff such as geographic 
distribution. A single taxon might have more than one taxon name applied to it 
(synonyms), and a single taxon name (in the name-object sense, not just the 
name-string sense) might have been used to represent different taxon concepts 
(e.g., sensu stricto vs. sensu lato senses of the same name-object). The most 
practical way to refer to a taxon is the combination of a name-object (as 
described above), plus usage instance, e.g. "Aus bus Linnaeus 1758 sec. Pyle 
2014"  (the part before the "sec." represents the name-object, and the part 
after the "sec." refers to the specific usage instance that applies the 
name-object to a taxon concept).

Classifications (per se) are a little bit different, but are often included in 
the taxon concept space, even though they are technically not (logically) part 
of the taxon concept.  The taxon concept is really the circumscribed set of 
organisms included within the concept.  Changing the higher classification, by 
itself, has no impact on the circumscribed set of organisms included within the 
concept.  However, that's a topic for another can-of-worms discussion.

So.... The seven years of NOMINA meetings, millions of conversations and 
millions of dollars has revealed that the notion of a "Taxon Name Usage" 
instances (TNU), as indexed in the Global Names Usage Bank (GNUB), is an 
extremely powerful unit that addresses taxon names (name-objects), taxon 
concepts, and classifications; all with a single domain of identifiers (minted 
for TNUs).  Rob Whitton and I have functioning prototypes that demonstrate the 
power of TNUs for managing nomenclatural, taxonomic, and classification data; 
and we just last week submitted a proposal to NSF to expand these prototypes 
into full-function services.

The seven years and millions of conversations and dollars has also taught us 
that the most practical way to manage this information in biodiversity 
informatics-land is through two nodes:  a "dirty bucket" (GNI name-strings), 
and a "clean bucket" (GNUB).  Dima Mozzherin has new funding from NSF to begin 
developing the service workflows to bridge name-strings (as they exist in most 
biodiversity databases) to Protonyms (the subset of TNUs that represent 
name-objects).  Starting in October, we will begin to bridge our respective 
prototypes (funded by NSF through the Global Names project) into a seamless 
tool.  We hope to have something more meaningful to say about this at TDWG; but 
one of the key things to keep in mind is that GNA (which includes GNI & GNUB) 
are low-level cross-linking tools and services - NOT replacements for CoL, 
ITIS, EOL, GBIF, WoRMS, NCBI taxonomy, etc., etc., etc.  These other 
initiatives provide the information that end-users actually want.  The role of 
GNA is to provide a core infrastructure (analogous to DNS) that most people use 
every day without ever knowing it.

The DOI thing is a bit of a misdirection.  The "identifiers" (sensu non-LOD 
world) for name-strings are managed by GNI, and for TNUs by GNUB.  Both are 
UUIDs, and as such are pure identifiers (i.e., not actionable by themselves). 
DOI is one of many possible identifier dereferencing services (ARC is another, 
and there are a host of others).  DOI happens to be a particularly robust and 
useful dereferencing services, and as such it makes perfect sense to me to 
represent TNU identifiers as DOIs, as long as someone has the funding to make 
it happen.

So... to follow on Rod's example, the TNU representing the "name-object" for 
the species epithet "vilcabambae", as originally established in the publication 
Lehr 2007, is:
4B913B74-E880-4EC9-B0A9-F3AB9F02288B

Alone, that UUID does even less for you than the text-string "Pristimantis 
vilcabambae"  does.  However, combining it with a dereferencing service, such 
as http://zoobank.org/, you can start doing some more interesting things:
http://zoobank.org/4B913B74-E880-4EC9-B0A9-F3AB9F02288B

For example, you can get to the original publication as registered in ZooBank 
(http://zoobank.org/37BFC245-DDD6-4AB4-B4B1-DD6826B86873), which gets you a 
link to the DOI and the ResearchGate page for this reference.  You can also get 
a link to the GBIF page, ITIS page, EOL page, ION page, and a few others (you'd 
also get links to the ASW site, if they had continued to expose their internal 
identifiers; though now it seems that they don't anymore).  You also see a call 
to BHL's OpenURL service to "automagically" get the page image of the original 
description.  And you get a resultset from GNI to see links to other datasets.

And that's all from just ONE metadata dereferencing service (ZooBank).  I think 
it would be WONDERFUL to have this identifier represented within DOI-space as 
well (e.g., http://dx.doi.org/10.XXXXX/4B913B74-E880-4EC9-B0A9-F3AB9F02288B), 
but someone needs to step forward as the "XXXXX" domain to mint the DOI.  By 
doing so, not only would you be plugged into the GNA infrastructure (as 
described above), but also the CrossRef infrastructure and all the whizbang 
services that it provides.  PLAZI and GNA have agreed that a taxon treatment = 
a TNU, and hence will share the same UUIDs for them (thus opening up the PLAZI 
services for use with the same identifiers).

In summary, Taxon name-strings, name-objects, concepts (and also 
classifications) are very different things, with different implied properties, 
and different implied meanings.  GNA is well on its way to serving robust 
services based on persistent identifiers that are actionable through multiple 
dereferencing services. Including more dereferencing services (like DOI) is a 
GOOD thing!  Re-using identifiers is a GOOD thing.  Unnecessarily re-inventing 
wheels is NOT a particularly good thing.

Aloha,
Rich

P.S The astute among you will have noticed that the GNA cross-links and 
services (including ZooBank registrations) described above did not exist before 
I started replying to this email. And that is the POINT.  GNA is an 
INFRASTRUCTURE to allow *US* (we the biodiversity practitioners of the world) 
to cross-link content.  The fact that I was able to use the EXISTING GNA 
infrastructure to cross-link all these resources associated with the 
text-string name "Pristimantis vilcabambae" in FAR LESS time than it took me to 
compose this email message, speaks volumes about the potential that such an 
infrastructure can have.

From: api-users-bounces at lists.gbif.org<mailto:api-users-bounces at 
lists.gbif.org> [mailto:[email protected]] On Behalf Of Roderic 
Page
Sent: Monday, August 18, 2014 3:53 AM
To: Rob Guralnick
Cc: api-users at lists.gbif.org<mailto:api-users at lists.gbif.org>
Subject: Re: [API-users] Is there a GBIF specific LSID that can be used?

Hi Rob,

At the risk of opening the whole taxon/name/concept can of worms, I'd see this 
a little differently.

For me a taxon name is a name + the original publication, rather than simply a 
text string. A taxon is different again, being essentially a statement about a 
collection of things that belong to the same taxon, and a statement of what to 
call them.

Taxon databases (e.g., GBIF) tend use strings for names, when it would be more 
elegant to use identifiers for names + publications.  We could go some way 
towards cleaning the mess we've accumulated if we adopted (and reused) 
identifiers for these things. For a start, name strings that don't map to 
identifiers in nomenclators would immediately be under suspicion as being 
potentially erroneous. it also links names to evidence, which is something 
we're spectacularly bad at doing at the moment.

For example,  "Pristimantis vilcabambae" is a text string which isn't terribly 
useful. But if we combine that with details on where and when it was published 
we get something a bit more useful:

 "Pristimantis vilcabambae Lehr 2007 published in DOI 
http://dx.doi.org/10.3099/0027-4100(2007)159[145:NEFLPP]2.0.CO;2 "  This is the 
information I'm accumulating in BioNames, by combining metadata from ION LSIDs 
with data from CrossRef and BioStor , see 
http://bionames.org/names/cluster/1949681

Should this "name string + publication" get a DOI? Sure. Then I'd want GBIF 
(and other taxon databases) to link to this name on their taxon pages. In other 
words, http://www.gbif.org/species/2425396 should have an identifier for the 
taxon name, instead of simply using a text string.

I'm beginning to sound like Rich Pyle, and he and I would a lost certainly 
model these things differently, but name strings  <> taxon names <> taxa

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ

On 18 Aug 2014, at 14:29, Robert Guralnick <Robert.Guralnick at 
colorado.edu<mailto:Robert.Guralnick at colorado.edu>> wrote:

  Markus --- I think the answer to the question: "Would a taxon DOI be a 
valuable feature for you?" really depends on some of the details.  With a taxon 
name, you are putting a DOI on a string and one that has been dissociated from 
its source(s).  I would think more valuable would be a DOI linked to the 
checklist that contained the name, and maybe a passthrough (a la suffix 
passthroughs in the EZID system) to the individual name.  That way I can 
resolve that taxon name to the source from whence it came.

Best, Rob

On Mon, Aug 18, 2014 at 3:44 AM, Markus D?ring <mdoering at 
gbif.org<mailto:mdoering at gbif.org>> wrote:
Hello Geoff,

GBIF uses simples integers as taxon identifiers, for example 2396049 for 
Ecsenius bicolor.
These ids are stable, but obviously not globally unique. If you need a URI 
right now I would recommend for now to use our restful portal URL:
http://www.gbif.org/species/2396049

For the future I could imagine us assigning DOIs to taxa reusing the current 
integer ids, but that has to be carefully evaluated first.
Would a taxon DOI be a valuable feature for you?

Cheers,
Markus

--
Markus D?ring
Software Developer
Global Biodiversity Information Facility (GBIF)
mdoering at gbif.org<mailto:mdoering at gbif.org>
http://www.gbif.org<http://www.gbif.org/>

On 05 Aug 2014, at 05:17, Geoff Shuetrim <geoff at galexy.net<mailto:geoff at 
galexy.net>> wrote:

> Working with a range of web services, I have found myself making extensive 
> use of the LSIDs that are specific to each data source.  For example, for 
> ITIS, I use the Ecsenius bicolor LSID: urn:lsid:itis.gov:itis_tsn:636326
>
> For WoRMS, the LSID for Ecsenius bicolor is: 
> urn:lsid:marinespecies.org:taxname:277652
>
> For Atlas of living Australia the LSID for Ecsenius bicolor is:
> urn:lsid:biodiversity.org.au:afd.taxon:99c29e7c-5b04-4e57-8e6b-82aa442a801a
>
> Is there a GBIF LSID that can similarly be used as a unique identifier for a 
> taxon? I have come across the various GBIF unique keys but these are not 
> unique outside of the GBIF environment and within the Gaia Guide systems I am 
> deciding how best to work with these, ensuring their uniqueness, alongside 
> identifiers from other data sources.
>
> Thanks again for your assistance.
>
> Geoff Shuetrim
> Gaia Guide Association
> http://www.gaiaguide.info/
> _______________________________________________
> API-users mailing list
> API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
> http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

_______________________________________________
API-users mailing list
API-users at lists.gbif.org<mailto:API-users at lists.gbif.org>
http://lists.gbif.org/mailman/listinfo/api-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://lists.gbif.org/pipermail/api-users/attachments/20140820/0cd046bb/attachment-0001.html

[API-users] Is there a GBIF specific LSID that can be used?

Reply via email to