Re: RRID -> SciCrunch

Michael Crusoe Wed, 25 Oct 2017 09:53:26 -0700

2017-10-25 19:21 GMT+03:00 Steffen Möller <[email protected]>:

>
> On 25.10.17 17:49, Michael Crusoe wrote:
> >
> >
> > 2017-10-25 18:19 GMT+03:00 Matus Kalas <[email protected]
> > <mailto:[email protected]>>:
> >
> >     On 2017-10-25 15:12, Michael Crusoe wrote:
> >
> >         2017-10-25 16:04 GMT+03:00 Steffen Möller
> >         <[email protected] <mailto:[email protected]>>:
> >
> >             On 25.10.17 13:47, Michael Crusoe wrote:
> >
> >
> >
> >                 2017-10-25 14:34 GMT+03:00 Steffen Möller
> >                 <[email protected] <mailto:[email protected]>
> >                 <mailto:[email protected]
> >                 <mailto:[email protected]>>>:
> >
> >
> >                 On 25.10.17 10:56, Michael Crusoe wrote:
> >
> >                     Sorry, I missed the bit where we are deprecating
> >                     RRID. Can
> >
> >             someone
> >
> >                     explain?
> >
> >
> >                 Because of
> >
> >             https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf
> >             <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf> [1]
> >
> >                 <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf
> >                 <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf>
> >                 [1]>
> >
> >             and
> >
> >                 some web googling from which I gathered that the
> "Research
> >
> >             Resource
> >
> >                 IDentifiers" are not only provided by SciCrunch.
> >                 Admittedly, I
> >
> >             fail to
> >
> >                 find that page now that I want to find it :o/
> >
> >
> >                 There is no conflict here. scicrunch.org
> >                 <http://scicrunch.org> [2]
> >
> >             <http://scicrunch.org> is the
> >
> >                 post-pilot phase of what is described in that paper.
> >
> >
> >
> >                 Personally, I could not care less, let those catalog
> >                 providers
> >
> >             fight
> >
> >                 that out among themselves. However, I find that the
> >                 notion of
> >                 SciCrunch
> >                 clearly identifies the provenance of that information,
> >                 while
> >
> >             "RRID" to
> >
> >                 me is more of a concept coined by
> >                 (https://www.force11.org),
> >
> >             not a
> >
> >                 provider. And with several initiatives following the same
> >
> >             purpose, I
> >
> >                 found that by using SciCrunch not RRID, we would be
> >                 the most
> >                 provider-neutral. And then again, it is only something
> >                 local
> >
> >             to the
> >
> >                 Debian packaging, not publicly visible, so nobody
> >                 should truly
> >                 care and
> >                 the use of SciCrunch imho serves us best on a
> >                 technical level.
> >
> >
> >                 RRIDs share a single name space that allow for
> >                 multiple providers,
> >
> >             sci
> >
> >                 crunch being the current main provider for software
> >                 tools and
> >                 databases and other registries responsible for the
> >                 other types. By
> >                 referring to RRIDs generically then there is no conflict.
> >
> >                 See
> >                 https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558
> >                 <https://www.ebi.ac.uk/miriam/
> main/datatypes/MIR:00000558>
> >                 [3]
> >
> >             for an
> >
> >                 overview
> >
> >                 Please rename this field to RRID, or better yet just
> >                 have a list
> >
> >             of
> >
> >                 URIs like we do in CWL so you don't have to care if it
> >                 is a RRID,
> >
> >             DOI
> >
> >                 or whatever :-)
> >
> >                 http://www.commonwl.org/v1.0/CommandLineTool.html#
> SoftwarePackage
> >                 <http://www.commonwl.org/v1.0/CommandLineTool.html#
> SoftwarePackage>
> >
> >             [4]
> >
> >
> >             This is what we are doing. The field is called "Registry"
> >             (not RRID,
> >             so
> >             we can also refer to Wikis and other catalogs) and allows
> >             for an
> >             arbitrary unordered number of (Name, Entry) tupels, in
> >             complete
> >             analogy
> >             to the CWL, I tend to think.
> >
> >
> >         Well, no. In CWL we don't separate the provider from the
> >         identifier.
> >         That's the whole point about COOL URIs.
> >         I've CC'd Stian as he explains this better than I do.
> >
> >
> >     The problem is that from the applied IDs, only Bio.Tools provide
> >     COOL URIs. Other providers should, but they don't, at least not
> >     yet. Thus a provider + ID pair is, unfortunately, necessary.
> >
> >
> > RRIDs have a COOL URI form: https://identifiers.org/rrid/RRID:SCR_001156
>
>
> This is the one we are generating on the task page from the information
> Name: SciCrunch, Entry: SCR_001156. And as of today, the same would be
> generated if the entry was with Name: RRID.
>
> I think we would make as mistake to give up the distinction, let alone
> since we can no longer query the UDD for, e.g., entries that have an
> RRID but no bio.tools entry.



I'm not convinced that this proposal makes that impossible.


> >
> >
> >
> >     As a side-track: You mentioned DOIs, Michael.
> >     Would it make sense if Debian (Med) adds DOIs as citable links to
> >     upstream releases, in addition to the upstream version and
> >     upstream repo information?
> >     And in any case, DOIs for both the upstream project as a whole
> >     (i.e. all releases), and/or for the particular releases, can be
> >     added as citations for a src package, if package maintainer or
> >     upstream wish so.
> >
> >
> > Of course, the more links the merrier. In fact, one could write a
> > simple script to autofill most of the upstream/metadata fields from
> > any and all available identifiers and DOIs.
> >
> > A quick recap for those following along:
> > Software identifiers are for the concept of a particular piece of
> > software. They are persistent regardless of 1) the version of the
> > software 2) the release of a paper for major new functionality or 3)
> > switching to new repository
> >
> > khmer will always be identifiable
> > by https://identifiers.org/rrid/RRID:SCR_001156 regardless of new
> > releases, new papers, or new hosting platforms
> >
> > DOIs are currently used to identify point in time digital objects like
> > papers, or a certain source code release.
> > It is true that some services that issue DOIs for software releases,
> > like Zenodo and FigShare, do have a "primary" DOI that each release
> > derives from. But that becomes insufficient as one might switch
> > between those services or to another provider.
> >
> > Back to your suggestion: the next step is to determine the best place
> > to put these per-source DOIs within Debian.
> >
> > Do we add them to
> > 1) debian/upstream/metadata as part of a version:DOI dictionary?
> Interesting. I do not immediately see how this fits that format, though.
> > 2) debian/changelog for each release or just for the
> > ${upstreamversion}-1 release?
> ${upstreamversion}-1,  preferred over 1)
> > 3) the binary package control file in the binary
> > packages? https://www.debian.org/doc/debian-policy/#s-binarycontrolfiles
> may be required for indexing
> > 4) the debian source control file (.dsc)
> > ? https://www.debian.org/doc/debian-policy/#debian-source-
> control-files-dsc
> yes, again for indexing
> > and/or
> > 5) someplace else?
>
> Just to have it discussed: debian/copyright
>
> But today the version of the software is not specified in that document
> and we do not like changing the copyright file when the copyright has
> not changed. So, debian/copyright is inferiour to debian/changelog IMHO.
>
> >
> > We can help ourselves answer this question by determining how and for
> > what purposes we might want to access these DOIs
> > 1) From a running system, as part of a citation/provenance query?
> Yes, and there is some prior art to this. The package "devscripts"
> provides the tool "wnpp-alert" that shows all those packages that are
> orphaned or requested to be adopted and installed on that system. We
> could come up with the same for any CWL workflow/wrapper - but for that
> we would not need this information shipping with the Debian packages
> themselves but could work with the catalogs directly.
> > 2) From Umegaya / the Ultimate Debian Database (UDD) ?
>
> yes, this is the place to consult for the DOI assignments.
>
> Especially when the workflow is hidden away in some container, the query
> should be performed in an OS-independent manner - like via the UDD.
>
> > 3) Someplace else and/or some other purpose?
>
> If those DOI help with the communication between the various catalogs /
> software repositories / papers, then I presume that this is mostly
> outside of our immediate control.
>

Reminder, we are talking about per-version DOIs here. The easiest way to
communicate between databases about the idea of a piece of software is a
software identifier.


>
>
> >
> >
> >
> >
> >
> >             Related to your comment (and very, very close to my heart)
> >             is the
> >             question if we do everything sufficiently well to map the CWL
> >             workflows
> >             to Debian packages. We could for instance add references to
> >             CWL-workflow-database-entries for the workflows that a
> >             particular
> >             Debian
> >             packages is used in, so we can test them when the package
> >             updates -
> >             er,
> >             before the package updates in the distribution.
> >
> >
> >         We are good here; you can determine the packages used in any
> >         given CWL
> >         description that includes a SoftwareRequirement that is mappable
> >         directly or indirectly to a package.
> >
> >         For automated testing you would need a way to specify "normal" or
> >         expected results; CWL v1.0.x doesn't have that concept. A
> >         researchobject.org <http://researchobject.org> [16] RO that
> >         contains/references those results with
> >         the corresponding CWL workflow would however fulfill this role.
> >
> >
> >     And another side-track: In addition to CWL workflows and using
> >     them as test (requiring some input-output pairs and equality
> >     relation), would it make sense for Debian to link to some kind of
> >     "CWL wrappers" for the single tools?
> >
> >
> > Instead of linking we can include them in the package, like we do unix
> > manual pages.
> It is what I had also suggested. Maybe we can come up with an
> auto-update with a dh-cwl helper when there is internet access?
> >
> > See the section to the spec about where to find CWL tool descriptions:
> > http://www.commonwl.org/v1.0/CommandLineTool.html#
> Discovering_CWL_documents_on_a_local_filesystem
> >
> > Perhaps this should be added to the Debian-Med policy as a bonus item
> > for packages? samtools already ships some descriptions
> >
> > $ apt-file search /usr/share/commonwl
> > samtools: /usr/share/commonwl/samtools-faidx.cwl
> > samtools: /usr/share/commonwl/samtools-index.cwl
> > samtools: /usr/share/commonwl/samtools-rmdup.cwl
> > samtools: /usr/share/commonwl/samtools-sort.cwl
> > samtools: /usr/share/commonwl/samtools-view.cwl
> I was not aware of those - excellent!
> >
> >
> >     That is again similar to the elsewhere-discussed proposal of
> >     generating (and/or linking to) software containers (Docker,
> >     Singularity, rkt?)...
> >
> > Software containers can be generated fairly automatically and don't
> > really benefit from upstream's participation.
> Let us see how this develops. For instance, I anticipate that most
> issues that Debian packages run into when there are new versions out,
> will also affect the BioConda community. Via OMICtools we have an
> indirect mapping from Debian packages to BioConda. We could make that a
> more direct one. That way we could mutually learn about issues with
> particular new versions that affect various auto-generated
> Docker/Singularity images.
>

Ah, now you're talking about linking to other packaging systems which I
support.
However, with software identifiers being adopted by both debian-med and
bioconda the linking becomes implicit


> > CWL tool descriptions can and should be maintained collectively;
> > preferably they are offered to upstream for inclusion just like other
> > Debian instigated patches and manual pages are sent up.
> I agree. And in a way this is why I find it problematic to statically
> ship those wrappers when there are newer versions already available on
> the CWL github. We need an update mechanism, I think, not only at build
> time but also for the already installed packages - but then again, this
> very much contradicts the concepts of a stable release. So, I still need
> to make my mind up about this all.
>

CWL tool descriptions will stabilize quickly enough. CWL executors are not
required to use the descriptions in /usr/share/commonwl (or any other
location); they merely assist users in getting started with the software
already on their system. At anytime they can write their own, download a
different one, or copy and improve the system installed version.


> >
> >     Back to the topic: I agree with Steffen that if we mean the link
> >     pairs as Provider + ID (as opposed to ID_type + ID_value), then
> >     SciCrunch makes more sense than RRID.
>

>From https://identifiers.org/rrid/RRID:SCR_001156

"Proper citation

khmer, RRID:SCR_001156"

So please don't strip off RRID :-)



> >
> >
> >     Cheers,
> >     Matus
> >
> >
> >
> >
> > --
> > Michael R. Crusoe
> > Co-founder & Lead,
> > Common Workflow Language project <http://www.commonwl.org/>
> > https://impactstory.org/u/0000-0002-2961-9670
> > [email protected] <mailto:[email protected]>
> > +1 480 627 9108
>
>


-- 
Michael R. Crusoe
Co-founder & Lead,
Common Workflow Language project <http://www.commonwl.org/>
https://impactstory.org/u/0000-0002-2961-9670
[email protected]
+1 480 627 9108

Re: RRID -> SciCrunch

Reply via email to