Re: RRID -> SciCrunch

Michael Crusoe Wed, 25 Oct 2017 09:34:55 -0700

2017-10-25 18:54 GMT+03:00 Steffen Möller <[email protected]>:

>
> On 25.10.17 17:19, Matus Kalas wrote:
> > On 2017-10-25 15:12, Michael Crusoe wrote:
> >> 2017-10-25 16:04 GMT+03:00 Steffen Möller <[email protected]>:
> >>
> >>> On 25.10.17 13:47, Michael Crusoe wrote:
> >>>>
> >>>>
> >>>> 2017-10-25 14:34 GMT+03:00 Steffen Möller <[email protected]
> >>>> <mailto:[email protected]>>:
> >>>>
> >>>>
> >>>> On 25.10.17 10:56, Michael Crusoe wrote:
> >>>>> Sorry, I missed the bit where we are deprecating RRID. Can
> >>> someone
> >>>>> explain?
> >>>>
> >>>> Because of
> >>> https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]
> >>>> <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]>
> >>> and
> >>>> some web googling from which I gathered that the "Research
> >>> Resource
> >>>> IDentifiers" are not only provided by SciCrunch. Admittedly, I
> >>> fail to
> >>>> find that page now that I want to find it :o/
> >>>>
> >>>>
> >>>> There is no conflict here. scicrunch.org [2]
> >>> <http://scicrunch.org> is the
> >>>> post-pilot phase of what is described in that paper.
> >>>>
> >>>>
> >>>>
> >>>> Personally, I could not care less, let those catalog providers
> >>> fight
> >>>> that out among themselves. However, I find that the notion of
> >>>> SciCrunch
> >>>> clearly identifies the provenance of that information, while
> >>> "RRID" to
> >>>> me is more of a concept coined by (https://www.force11.org),
> >>> not a
> >>>> provider. And with several initiatives following the same
> >>> purpose, I
> >>>> found that by using SciCrunch not RRID, we would be the most
> >>>> provider-neutral. And then again, it is only something local
> >>> to the
> >>>> Debian packaging, not publicly visible, so nobody should truly
> >>>> care and
> >>>> the use of SciCrunch imho serves us best on a technical level.
> >>>>
> >>>>
> >>>> RRIDs share a single name space that allow for multiple providers,
> >>> sci
> >>>> crunch being the current main provider for software tools and
> >>>> databases and other registries responsible for the other types. By
> >>>> referring to RRIDs generically then there is no conflict.
> >>>>
> >>>> See https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558 [3]
> >>> for an
> >>>> overview
> >>>>
> >>>> Please rename this field to RRID, or better yet just have a list
> >>> of
> >>>> URIs like we do in CWL so you don't have to care if it is a RRID,
> >>> DOI
> >>>> or whatever :-)
> >>>>
> >>>> http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage
> >>> [4]
> >>>>
> >>> This is what we are doing. The field is called "Registry" (not RRID,
> >>> so
> >>> we can also refer to Wikis and other catalogs) and allows for an
> >>> arbitrary unordered number of (Name, Entry) tupels, in complete
> >>> analogy
> >>> to the CWL, I tend to think.
> >>
> >> Well, no. In CWL we don't separate the provider from the identifier.
> >> That's the whole point about COOL URIs.
> >> I've CC'd Stian as he explains this better than I do.
> >
> > The problem is that from the applied IDs, only Bio.Tools provide COOL
> > URIs. Other providers should, but they don't, at least not yet. Thus a
> > provider + ID pair is, unfortunately, necessary.
> >
> > As a side-track: You mentioned DOIs, Michael.
> > Would it make sense if Debian (Med) adds DOIs as citable links to
> > upstream releases, in addition to the upstream version and upstream
> > repo information?
> > And in any case, DOIs for both the upstream project as a whole (i.e.
> > all releases), and/or for the particular releases, can be added as
> > citations for a src package, if package maintainer or upstream wish so.
>
> The task page already presents traditional references and at least the
> metadata file also harvests their DOIs. I have read that an entry in
> OMICtools would also have a DOI, but I admit not to have seen that yet,
> and because I am confused enough already, I have not looked for it, either.
>
> When we now add DOIs that indicate a particular version of the software,
> so I admit to embrace that for the nice semantics behind that (like a
> feature having been added at a particular point in time) but then again
> ... ouch. We will have some individuals that reference a software by its
> DOI and others by a version. And how do you express with a DOI that a
> particular feature is no longer present? Trivial with a version number
> since it is ordered. And now someone tells me that DOIs are also ordered
> - they most likely are but close to nobody thinks about it that way, I
> presume.
>
>
> >
> >
> >>> Related to your comment (and very, very close to my heart) is the
> >>> question if we do everything sufficiently well to map the CWL
> >>> workflows
> >>> to Debian packages. We could for instance add references to
> >>> CWL-workflow-database-entries for the workflows that a particular
> >>> Debian
> >>> packages is used in, so we can test them when the package updates -
> >>> er,
> >>> before the package updates in the distribution.
> >>
> >> We are good here; you can determine the packages used in any given CWL
> >> description that includes a SoftwareRequirement that is mappable
> >> directly or indirectly to a package.
> >>
> >> For automated testing you would need a way to specify "normal" or
> >> expected results; CWL v1.0.x doesn't have that concept. A
> >> researchobject.org [16] RO that contains/references those results with
> >> the corresponding CWL workflow would however fulfill this role.
> >>
>
> I am a fan. Yes, please! RO, go, go, go! Let us complete one. While
> writing this down I somehow sensed that I got the dependencies wrong.
> So, please correct me. I initially saw:
>
>  * Debian package that features the CWL, happily an auto-created Debian
> package from a CWL-database.
>
>    - using only Debian packages to perform the workflow
>
>    - we are a bit weak on the Debianisation of public data that we are
> likely to need for those tests
>
>  * Auto-created test(s) for that workflow added from the RO collection.
>
>  * Submitted to Debian as a package
>
>  * Control the results from ci.debian.org
>
>
> I am not so confident that we have a 1:n relationship between CWL
> workflows and ROs. It is more like one RO integrating a subset of
> workflows, right? This will render it all a bit more complicated, as in
> CWL-representing Debian packages depending on each other, but I still
> like it.
>


A researchobject is a data container or data manifest. It is the
recommended method of communicating the results of a CWL workflow. As a
single CWL workflow can produces varying outputs based upon varying inputs
there can naturally be many different ROs describing those different ouptus.


>
> How should we name those Debian packages that are auto-created to
> represent a CWL workflow? This depends on the database from which we
> derive the package, right? Do we agree that the ROs are not appearing as
> packages themselves?


I do not think it makes sense to package particular workflows as a Debian
package; except as shared example workflow to be Recommended by the various
CWL runners


>
> > That is again similar to the elsewhere-discussed proposal of
> > generating (and/or linking to) software containers (Docker,
> > Singularity, rkt?)...
>
> We should reference them, too.  I am a bit uncertain if we should
> distinguish between a software container installs a Debian package and a
> container installs a Conda package.


What's the value of referencing an outside software container from a Debian
package? They are easy enough to make by hand and are soon going to be
autogenerated..


>
> > Back to the topic: I agree with Steffen that if we mean the link pairs
> > as Provider + ID (as opposed to ID_type + ID_value), then SciCrunch
> > makes more sense than RRID.
>
> Nice to hear in the sense that we would not have to change them all back
> again. Let us wait to learn if Stian is any opinionated about it.
>
> @Matúš, I'd like to open another thread on how to proceed with the EDAM
> annotation of packages in Debian to help structuring our task pages. My
> immediate thought was to just take the Topic and then store the whole
> path (there was path, right?) of your ontology until it gets to that
> topic as a header for all the tools that share that topic.
>
> The annoying bit would then be that the upload into the UDD (that Debian
> database from which the task page is derived) would need to know about
> the EDAM ontology. But that would not scale with all the other Debian
> packages of other disciplines. So we would need to somewhere generate
> the full path in an automated fashion as a derived field (since you may
> reorganise your ontology) or we live with the specifier of the topic
> alone. But then we are weaker on the semantics. Ideas welcome.
>
> Steffen
>
>
>


-- 
Michael R. Crusoe
Co-founder & Lead,
Common Workflow Language project <http://www.commonwl.org/>
https://impactstory.org/u/0000-0002-2961-9670
[email protected]
+1 480 627 9108

Re: RRID -> SciCrunch

Reply via email to