2017-10-25 19:21 GMT+03:00 Steffen Möller <[email protected]>: > > On 25.10.17 17:49, Michael Crusoe wrote: > > > > > > 2017-10-25 18:19 GMT+03:00 Matus Kalas <[email protected] > > <mailto:[email protected]>>: > > > > On 2017-10-25 15:12, Michael Crusoe wrote: > > > > 2017-10-25 16:04 GMT+03:00 Steffen Möller > > <[email protected] <mailto:[email protected]>>: > > > > On 25.10.17 13:47, Michael Crusoe wrote: > > > > > > > > 2017-10-25 14:34 GMT+03:00 Steffen Möller > > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>>>: > > > > > > On 25.10.17 10:56, Michael Crusoe wrote: > > > > Sorry, I missed the bit where we are deprecating > > RRID. Can > > > > someone > > > > explain? > > > > > > Because of > > > > https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf > > <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf> [1] > > > > <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf > > <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf> > > [1]> > > > > and > > > > some web googling from which I gathered that the > "Research > > > > Resource > > > > IDentifiers" are not only provided by SciCrunch. > > Admittedly, I > > > > fail to > > > > find that page now that I want to find it :o/ > > > > > > There is no conflict here. scicrunch.org > > <http://scicrunch.org> [2] > > > > <http://scicrunch.org> is the > > > > post-pilot phase of what is described in that paper. > > > > > > > > Personally, I could not care less, let those catalog > > providers > > > > fight > > > > that out among themselves. However, I find that the > > notion of > > SciCrunch > > clearly identifies the provenance of that information, > > while > > > > "RRID" to > > > > me is more of a concept coined by > > (https://www.force11.org), > > > > not a > > > > provider. And with several initiatives following the same > > > > purpose, I > > > > found that by using SciCrunch not RRID, we would be > > the most > > provider-neutral. And then again, it is only something > > local > > > > to the > > > > Debian packaging, not publicly visible, so nobody > > should truly > > care and > > the use of SciCrunch imho serves us best on a > > technical level. > > > > > > RRIDs share a single name space that allow for > > multiple providers, > > > > sci > > > > crunch being the current main provider for software > > tools and > > databases and other registries responsible for the > > other types. By > > referring to RRIDs generically then there is no conflict. > > > > See > > https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558 > > <https://www.ebi.ac.uk/miriam/ > main/datatypes/MIR:00000558> > > [3] > > > > for an > > > > overview > > > > Please rename this field to RRID, or better yet just > > have a list > > > > of > > > > URIs like we do in CWL so you don't have to care if it > > is a RRID, > > > > DOI > > > > or whatever :-) > > > > http://www.commonwl.org/v1.0/CommandLineTool.html# > SoftwarePackage > > <http://www.commonwl.org/v1.0/CommandLineTool.html# > SoftwarePackage> > > > > [4] > > > > > > This is what we are doing. The field is called "Registry" > > (not RRID, > > so > > we can also refer to Wikis and other catalogs) and allows > > for an > > arbitrary unordered number of (Name, Entry) tupels, in > > complete > > analogy > > to the CWL, I tend to think. > > > > > > Well, no. In CWL we don't separate the provider from the > > identifier. > > That's the whole point about COOL URIs. > > I've CC'd Stian as he explains this better than I do. > > > > > > The problem is that from the applied IDs, only Bio.Tools provide > > COOL URIs. Other providers should, but they don't, at least not > > yet. Thus a provider + ID pair is, unfortunately, necessary. > > > > > > RRIDs have a COOL URI form: https://identifiers.org/rrid/RRID:SCR_001156 > > > This is the one we are generating on the task page from the information > Name: SciCrunch, Entry: SCR_001156. And as of today, the same would be > generated if the entry was with Name: RRID. > > I think we would make as mistake to give up the distinction, let alone > since we can no longer query the UDD for, e.g., entries that have an > RRID but no bio.tools entry.
I'm not convinced that this proposal makes that impossible. > > > > > > > > As a side-track: You mentioned DOIs, Michael. > > Would it make sense if Debian (Med) adds DOIs as citable links to > > upstream releases, in addition to the upstream version and > > upstream repo information? > > And in any case, DOIs for both the upstream project as a whole > > (i.e. all releases), and/or for the particular releases, can be > > added as citations for a src package, if package maintainer or > > upstream wish so. > > > > > > Of course, the more links the merrier. In fact, one could write a > > simple script to autofill most of the upstream/metadata fields from > > any and all available identifiers and DOIs. > > > > A quick recap for those following along: > > Software identifiers are for the concept of a particular piece of > > software. They are persistent regardless of 1) the version of the > > software 2) the release of a paper for major new functionality or 3) > > switching to new repository > > > > khmer will always be identifiable > > by https://identifiers.org/rrid/RRID:SCR_001156 regardless of new > > releases, new papers, or new hosting platforms > > > > DOIs are currently used to identify point in time digital objects like > > papers, or a certain source code release. > > It is true that some services that issue DOIs for software releases, > > like Zenodo and FigShare, do have a "primary" DOI that each release > > derives from. But that becomes insufficient as one might switch > > between those services or to another provider. > > > > Back to your suggestion: the next step is to determine the best place > > to put these per-source DOIs within Debian. > > > > Do we add them to > > 1) debian/upstream/metadata as part of a version:DOI dictionary? > Interesting. I do not immediately see how this fits that format, though. > > 2) debian/changelog for each release or just for the > > ${upstreamversion}-1 release? > ${upstreamversion}-1, preferred over 1) > > 3) the binary package control file in the binary > > packages? https://www.debian.org/doc/debian-policy/#s-binarycontrolfiles > may be required for indexing > > 4) the debian source control file (.dsc) > > ? https://www.debian.org/doc/debian-policy/#debian-source- > control-files-dsc > yes, again for indexing > > and/or > > 5) someplace else? > > Just to have it discussed: debian/copyright > > But today the version of the software is not specified in that document > and we do not like changing the copyright file when the copyright has > not changed. So, debian/copyright is inferiour to debian/changelog IMHO. > > > > > We can help ourselves answer this question by determining how and for > > what purposes we might want to access these DOIs > > 1) From a running system, as part of a citation/provenance query? > Yes, and there is some prior art to this. The package "devscripts" > provides the tool "wnpp-alert" that shows all those packages that are > orphaned or requested to be adopted and installed on that system. We > could come up with the same for any CWL workflow/wrapper - but for that > we would not need this information shipping with the Debian packages > themselves but could work with the catalogs directly. > > 2) From Umegaya / the Ultimate Debian Database (UDD) ? > > yes, this is the place to consult for the DOI assignments. > > Especially when the workflow is hidden away in some container, the query > should be performed in an OS-independent manner - like via the UDD. > > > 3) Someplace else and/or some other purpose? > > If those DOI help with the communication between the various catalogs / > software repositories / papers, then I presume that this is mostly > outside of our immediate control. > Reminder, we are talking about per-version DOIs here. The easiest way to communicate between databases about the idea of a piece of software is a software identifier. > > > > > > > > > > > > > > Related to your comment (and very, very close to my heart) > > is the > > question if we do everything sufficiently well to map the CWL > > workflows > > to Debian packages. We could for instance add references to > > CWL-workflow-database-entries for the workflows that a > > particular > > Debian > > packages is used in, so we can test them when the package > > updates - > > er, > > before the package updates in the distribution. > > > > > > We are good here; you can determine the packages used in any > > given CWL > > description that includes a SoftwareRequirement that is mappable > > directly or indirectly to a package. > > > > For automated testing you would need a way to specify "normal" or > > expected results; CWL v1.0.x doesn't have that concept. A > > researchobject.org <http://researchobject.org> [16] RO that > > contains/references those results with > > the corresponding CWL workflow would however fulfill this role. > > > > > > And another side-track: In addition to CWL workflows and using > > them as test (requiring some input-output pairs and equality > > relation), would it make sense for Debian to link to some kind of > > "CWL wrappers" for the single tools? > > > > > > Instead of linking we can include them in the package, like we do unix > > manual pages. > It is what I had also suggested. Maybe we can come up with an > auto-update with a dh-cwl helper when there is internet access? > > > > See the section to the spec about where to find CWL tool descriptions: > > http://www.commonwl.org/v1.0/CommandLineTool.html# > Discovering_CWL_documents_on_a_local_filesystem > > > > Perhaps this should be added to the Debian-Med policy as a bonus item > > for packages? samtools already ships some descriptions > > > > $ apt-file search /usr/share/commonwl > > samtools: /usr/share/commonwl/samtools-faidx.cwl > > samtools: /usr/share/commonwl/samtools-index.cwl > > samtools: /usr/share/commonwl/samtools-rmdup.cwl > > samtools: /usr/share/commonwl/samtools-sort.cwl > > samtools: /usr/share/commonwl/samtools-view.cwl > I was not aware of those - excellent! > > > > > > That is again similar to the elsewhere-discussed proposal of > > generating (and/or linking to) software containers (Docker, > > Singularity, rkt?)... > > > > Software containers can be generated fairly automatically and don't > > really benefit from upstream's participation. > Let us see how this develops. For instance, I anticipate that most > issues that Debian packages run into when there are new versions out, > will also affect the BioConda community. Via OMICtools we have an > indirect mapping from Debian packages to BioConda. We could make that a > more direct one. That way we could mutually learn about issues with > particular new versions that affect various auto-generated > Docker/Singularity images. > Ah, now you're talking about linking to other packaging systems which I support. However, with software identifiers being adopted by both debian-med and bioconda the linking becomes implicit > > CWL tool descriptions can and should be maintained collectively; > > preferably they are offered to upstream for inclusion just like other > > Debian instigated patches and manual pages are sent up. > I agree. And in a way this is why I find it problematic to statically > ship those wrappers when there are newer versions already available on > the CWL github. We need an update mechanism, I think, not only at build > time but also for the already installed packages - but then again, this > very much contradicts the concepts of a stable release. So, I still need > to make my mind up about this all. > CWL tool descriptions will stabilize quickly enough. CWL executors are not required to use the descriptions in /usr/share/commonwl (or any other location); they merely assist users in getting started with the software already on their system. At anytime they can write their own, download a different one, or copy and improve the system installed version. > > > > Back to the topic: I agree with Steffen that if we mean the link > > pairs as Provider + ID (as opposed to ID_type + ID_value), then > > SciCrunch makes more sense than RRID. > >From https://identifiers.org/rrid/RRID:SCR_001156 "Proper citation khmer, RRID:SCR_001156" So please don't strip off RRID :-) > > > > > > Cheers, > > Matus > > > > > > > > > > -- > > Michael R. Crusoe > > Co-founder & Lead, > > Common Workflow Language project <http://www.commonwl.org/> > > https://impactstory.org/u/0000-0002-2961-9670 > > [email protected] <mailto:[email protected]> > > +1 480 627 9108 > > -- Michael R. Crusoe Co-founder & Lead, Common Workflow Language project <http://www.commonwl.org/> https://impactstory.org/u/0000-0002-2961-9670 [email protected] +1 480 627 9108

