Re: RRID -> SciCrunch

Michael Crusoe Wed, 25 Oct 2017 08:56:26 -0700

2017-10-25 18:19 GMT+03:00 Matus Kalas <[email protected]>:

> On 2017-10-25 15:12, Michael Crusoe wrote:
>
>> 2017-10-25 16:04 GMT+03:00 Steffen Möller <[email protected]>:
>>
>> On 25.10.17 13:47, Michael Crusoe wrote:
>>>
>>>>
>>>>
>>>> 2017-10-25 14:34 GMT+03:00 Steffen Möller <[email protected]
>>>> <mailto:[email protected]>>:
>>>>
>>>>
>>>> On 25.10.17 10:56, Michael Crusoe wrote:
>>>>
>>>>> Sorry, I missed the bit where we are deprecating RRID. Can
>>>>>
>>>> someone
>>>
>>>> explain?
>>>>>
>>>>
>>>> Because of
>>>>
>>> https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]
>>>
>>>> <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]>
>>>>
>>> and
>>>
>>>> some web googling from which I gathered that the "Research
>>>>
>>> Resource
>>>
>>>> IDentifiers" are not only provided by SciCrunch. Admittedly, I
>>>>
>>> fail to
>>>
>>>> find that page now that I want to find it :o/
>>>>
>>>>
>>>> There is no conflict here. scicrunch.org [2]
>>>>
>>> <http://scicrunch.org> is the
>>>
>>>> post-pilot phase of what is described in that paper.
>>>>
>>>>
>>>>
>>>> Personally, I could not care less, let those catalog providers
>>>>
>>> fight
>>>
>>>> that out among themselves. However, I find that the notion of
>>>> SciCrunch
>>>> clearly identifies the provenance of that information, while
>>>>
>>> "RRID" to
>>>
>>>> me is more of a concept coined by (https://www.force11.org),
>>>>
>>> not a
>>>
>>>> provider. And with several initiatives following the same
>>>>
>>> purpose, I
>>>
>>>> found that by using SciCrunch not RRID, we would be the most
>>>> provider-neutral. And then again, it is only something local
>>>>
>>> to the
>>>
>>>> Debian packaging, not publicly visible, so nobody should truly
>>>> care and
>>>> the use of SciCrunch imho serves us best on a technical level.
>>>>
>>>>
>>>> RRIDs share a single name space that allow for multiple providers,
>>>>
>>> sci
>>>
>>>> crunch being the current main provider for software tools and
>>>> databases and other registries responsible for the other types. By
>>>> referring to RRIDs generically then there is no conflict.
>>>>
>>>> See https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558 [3]
>>>>
>>> for an
>>>
>>>> overview
>>>>
>>>> Please rename this field to RRID, or better yet just have a list
>>>>
>>> of
>>>
>>>> URIs like we do in CWL so you don't have to care if it is a RRID,
>>>>
>>> DOI
>>>
>>>> or whatever :-)
>>>>
>>>> http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage
>>>>
>>> [4]
>>>
>>>>
>>>> This is what we are doing. The field is called "Registry" (not RRID,
>>> so
>>> we can also refer to Wikis and other catalogs) and allows for an
>>> arbitrary unordered number of (Name, Entry) tupels, in complete
>>> analogy
>>> to the CWL, I tend to think.
>>>
>>
>> Well, no. In CWL we don't separate the provider from the identifier.
>> That's the whole point about COOL URIs.
>> I've CC'd Stian as he explains this better than I do.
>>
>
> The problem is that from the applied IDs, only Bio.Tools provide COOL
> URIs. Other providers should, but they don't, at least not yet. Thus a
> provider + ID pair is, unfortunately, necessary.
>


RRIDs have a COOL URI form: https://identifiers.org/rrid/RRID:SCR_001156


>
> As a side-track: You mentioned DOIs, Michael.
> Would it make sense if Debian (Med) adds DOIs as citable links to upstream
> releases, in addition to the upstream version and upstream repo information?
> And in any case, DOIs for both the upstream project as a whole (i.e. all
> releases), and/or for the particular releases, can be added as citations
> for a src package, if package maintainer or upstream wish so.
>

Of course, the more links the merrier. In fact, one could write a simple
script to autofill most of the upstream/metadata fields from any and all
available identifiers and DOIs.

A quick recap for those following along:
Software identifiers are for the concept of a particular piece of software.
They are persistent regardless of 1) the version of the software 2) the
release of a paper for major new functionality or 3) switching to new
repository

khmer will always be identifiable by
https://identifiers.org/rrid/RRID:SCR_001156 regardless of new releases,
new papers, or new hosting platforms

DOIs are currently used to identify point in time digital objects like
papers, or a certain source code release.
It is true that some services that issue DOIs for software releases, like
Zenodo and FigShare, do have a "primary" DOI that each release derives
from. But that becomes insufficient as one might switch between those
services or to another provider.

Back to your suggestion: the next step is to determine the best place to
put these per-source DOIs within Debian.

Do we add them to
1) debian/upstream/metadata as part of a version:DOI dictionary?
2) debian/changelog for each release or just for the ${upstreamversion}-1
release?
3) the binary package control file in the binary packages?
https://www.debian.org/doc/debian-policy/#s-binarycontrolfiles
4) the debian source control file (.dsc) ?
https://www.debian.org/doc/debian-policy/#debian-source-control-files-dsc
and/or
5) someplace else?

We can help ourselves answer this question by determining how and for what
purposes we might want to access these DOIs
1) From a running system, as part of a citation/provenance query?
2) From Umegaya / the Ultimate Debian Database (UDD) ?
3) Someplace else and/or some other purpose?


>
>
>
> Related to your comment (and very, very close to my heart) is the
>>> question if we do everything sufficiently well to map the CWL
>>> workflows
>>> to Debian packages. We could for instance add references to
>>> CWL-workflow-database-entries for the workflows that a particular
>>> Debian
>>> packages is used in, so we can test them when the package updates -
>>> er,
>>> before the package updates in the distribution.
>>>
>>
>> We are good here; you can determine the packages used in any given CWL
>> description that includes a SoftwareRequirement that is mappable
>> directly or indirectly to a package.
>>
>> For automated testing you would need a way to specify "normal" or
>> expected results; CWL v1.0.x doesn't have that concept. A
>> researchobject.org [16] RO that contains/references those results with
>> the corresponding CWL workflow would however fulfill this role.
>>
>>
> And another side-track: In addition to CWL workflows and using them as
> test (requiring some input-output pairs and equality relation), would it
> make sense for Debian to link to some kind of "CWL wrappers" for the single
> tools?
>

Instead of linking we can include them in the package, like we do unix
manual pages.

See the section to the spec about where to find CWL tool descriptions:
http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem

Perhaps this should be added to the Debian-Med policy as a bonus item for
packages? samtools already ships some descriptions

$ apt-file search /usr/share/commonwl
samtools: /usr/share/commonwl/samtools-faidx.cwl
samtools: /usr/share/commonwl/samtools-index.cwl
samtools: /usr/share/commonwl/samtools-rmdup.cwl
samtools: /usr/share/commonwl/samtools-sort.cwl
samtools: /usr/share/commonwl/samtools-view.cwl


> That is again similar to the elsewhere-discussed proposal of generating
> (and/or linking to) software containers (Docker, Singularity, rkt?)...
>
Software containers can be generated fairly automatically and don't really
benefit from upstream's participation.

CWL tool descriptions can and should be maintained collectively; preferably
they are offered to upstream for inclusion just like other Debian
instigated patches and manual pages are sent up.

Back to the topic: I agree with Steffen that if we mean the link pairs as
> Provider + ID (as opposed to ID_type + ID_value), then SciCrunch makes more
> sense than RRID.
>
>
> Cheers,
> Matus
>



-- 
Michael R. Crusoe
Co-founder & Lead,
Common Workflow Language project <http://www.commonwl.org/>
https://impactstory.org/u/0000-0002-2961-9670
[email protected]
+1 480 627 9108

Re: RRID -> SciCrunch

Reply via email to