2017-10-25 18:19 GMT+03:00 Matus Kalas <[email protected]>: > On 2017-10-25 15:12, Michael Crusoe wrote: > >> 2017-10-25 16:04 GMT+03:00 Steffen Möller <[email protected]>: >> >> On 25.10.17 13:47, Michael Crusoe wrote: >>> >>>> >>>> >>>> 2017-10-25 14:34 GMT+03:00 Steffen Möller <[email protected] >>>> <mailto:[email protected]>>: >>>> >>>> >>>> On 25.10.17 10:56, Michael Crusoe wrote: >>>> >>>>> Sorry, I missed the bit where we are deprecating RRID. Can >>>>> >>>> someone >>> >>>> explain? >>>>> >>>> >>>> Because of >>>> >>> https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1] >>> >>>> <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]> >>>> >>> and >>> >>>> some web googling from which I gathered that the "Research >>>> >>> Resource >>> >>>> IDentifiers" are not only provided by SciCrunch. Admittedly, I >>>> >>> fail to >>> >>>> find that page now that I want to find it :o/ >>>> >>>> >>>> There is no conflict here. scicrunch.org [2] >>>> >>> <http://scicrunch.org> is the >>> >>>> post-pilot phase of what is described in that paper. >>>> >>>> >>>> >>>> Personally, I could not care less, let those catalog providers >>>> >>> fight >>> >>>> that out among themselves. However, I find that the notion of >>>> SciCrunch >>>> clearly identifies the provenance of that information, while >>>> >>> "RRID" to >>> >>>> me is more of a concept coined by (https://www.force11.org), >>>> >>> not a >>> >>>> provider. And with several initiatives following the same >>>> >>> purpose, I >>> >>>> found that by using SciCrunch not RRID, we would be the most >>>> provider-neutral. And then again, it is only something local >>>> >>> to the >>> >>>> Debian packaging, not publicly visible, so nobody should truly >>>> care and >>>> the use of SciCrunch imho serves us best on a technical level. >>>> >>>> >>>> RRIDs share a single name space that allow for multiple providers, >>>> >>> sci >>> >>>> crunch being the current main provider for software tools and >>>> databases and other registries responsible for the other types. By >>>> referring to RRIDs generically then there is no conflict. >>>> >>>> See https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558 [3] >>>> >>> for an >>> >>>> overview >>>> >>>> Please rename this field to RRID, or better yet just have a list >>>> >>> of >>> >>>> URIs like we do in CWL so you don't have to care if it is a RRID, >>>> >>> DOI >>> >>>> or whatever :-) >>>> >>>> http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage >>>> >>> [4] >>> >>>> >>>> This is what we are doing. The field is called "Registry" (not RRID, >>> so >>> we can also refer to Wikis and other catalogs) and allows for an >>> arbitrary unordered number of (Name, Entry) tupels, in complete >>> analogy >>> to the CWL, I tend to think. >>> >> >> Well, no. In CWL we don't separate the provider from the identifier. >> That's the whole point about COOL URIs. >> I've CC'd Stian as he explains this better than I do. >> > > The problem is that from the applied IDs, only Bio.Tools provide COOL > URIs. Other providers should, but they don't, at least not yet. Thus a > provider + ID pair is, unfortunately, necessary. >
RRIDs have a COOL URI form: https://identifiers.org/rrid/RRID:SCR_001156 > > As a side-track: You mentioned DOIs, Michael. > Would it make sense if Debian (Med) adds DOIs as citable links to upstream > releases, in addition to the upstream version and upstream repo information? > And in any case, DOIs for both the upstream project as a whole (i.e. all > releases), and/or for the particular releases, can be added as citations > for a src package, if package maintainer or upstream wish so. > Of course, the more links the merrier. In fact, one could write a simple script to autofill most of the upstream/metadata fields from any and all available identifiers and DOIs. A quick recap for those following along: Software identifiers are for the concept of a particular piece of software. They are persistent regardless of 1) the version of the software 2) the release of a paper for major new functionality or 3) switching to new repository khmer will always be identifiable by https://identifiers.org/rrid/RRID:SCR_001156 regardless of new releases, new papers, or new hosting platforms DOIs are currently used to identify point in time digital objects like papers, or a certain source code release. It is true that some services that issue DOIs for software releases, like Zenodo and FigShare, do have a "primary" DOI that each release derives from. But that becomes insufficient as one might switch between those services or to another provider. Back to your suggestion: the next step is to determine the best place to put these per-source DOIs within Debian. Do we add them to 1) debian/upstream/metadata as part of a version:DOI dictionary? 2) debian/changelog for each release or just for the ${upstreamversion}-1 release? 3) the binary package control file in the binary packages? https://www.debian.org/doc/debian-policy/#s-binarycontrolfiles 4) the debian source control file (.dsc) ? https://www.debian.org/doc/debian-policy/#debian-source-control-files-dsc and/or 5) someplace else? We can help ourselves answer this question by determining how and for what purposes we might want to access these DOIs 1) From a running system, as part of a citation/provenance query? 2) From Umegaya / the Ultimate Debian Database (UDD) ? 3) Someplace else and/or some other purpose? > > > > Related to your comment (and very, very close to my heart) is the >>> question if we do everything sufficiently well to map the CWL >>> workflows >>> to Debian packages. We could for instance add references to >>> CWL-workflow-database-entries for the workflows that a particular >>> Debian >>> packages is used in, so we can test them when the package updates - >>> er, >>> before the package updates in the distribution. >>> >> >> We are good here; you can determine the packages used in any given CWL >> description that includes a SoftwareRequirement that is mappable >> directly or indirectly to a package. >> >> For automated testing you would need a way to specify "normal" or >> expected results; CWL v1.0.x doesn't have that concept. A >> researchobject.org [16] RO that contains/references those results with >> the corresponding CWL workflow would however fulfill this role. >> >> > And another side-track: In addition to CWL workflows and using them as > test (requiring some input-output pairs and equality relation), would it > make sense for Debian to link to some kind of "CWL wrappers" for the single > tools? > Instead of linking we can include them in the package, like we do unix manual pages. See the section to the spec about where to find CWL tool descriptions: http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem Perhaps this should be added to the Debian-Med policy as a bonus item for packages? samtools already ships some descriptions $ apt-file search /usr/share/commonwl samtools: /usr/share/commonwl/samtools-faidx.cwl samtools: /usr/share/commonwl/samtools-index.cwl samtools: /usr/share/commonwl/samtools-rmdup.cwl samtools: /usr/share/commonwl/samtools-sort.cwl samtools: /usr/share/commonwl/samtools-view.cwl > That is again similar to the elsewhere-discussed proposal of generating > (and/or linking to) software containers (Docker, Singularity, rkt?)... > Software containers can be generated fairly automatically and don't really benefit from upstream's participation. CWL tool descriptions can and should be maintained collectively; preferably they are offered to upstream for inclusion just like other Debian instigated patches and manual pages are sent up. Back to the topic: I agree with Steffen that if we mean the link pairs as > Provider + ID (as opposed to ID_type + ID_value), then SciCrunch makes more > sense than RRID. > > > Cheers, > Matus > -- Michael R. Crusoe Co-founder & Lead, Common Workflow Language project <http://www.commonwl.org/> https://impactstory.org/u/0000-0002-2961-9670 [email protected] +1 480 627 9108

