2017-10-25 18:54 GMT+03:00 Steffen Möller <[email protected]>: > > On 25.10.17 17:19, Matus Kalas wrote: > > On 2017-10-25 15:12, Michael Crusoe wrote: > >> 2017-10-25 16:04 GMT+03:00 Steffen Möller <[email protected]>: > >> > >>> On 25.10.17 13:47, Michael Crusoe wrote: > >>>> > >>>> > >>>> 2017-10-25 14:34 GMT+03:00 Steffen Möller <[email protected] > >>>> <mailto:[email protected]>>: > >>>> > >>>> > >>>> On 25.10.17 10:56, Michael Crusoe wrote: > >>>>> Sorry, I missed the bit where we are deprecating RRID. Can > >>> someone > >>>>> explain? > >>>> > >>>> Because of > >>> https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1] > >>>> <https://arxiv.org/ftp/arxiv/papers/1707/1707.03659.pdf [1]> > >>> and > >>>> some web googling from which I gathered that the "Research > >>> Resource > >>>> IDentifiers" are not only provided by SciCrunch. Admittedly, I > >>> fail to > >>>> find that page now that I want to find it :o/ > >>>> > >>>> > >>>> There is no conflict here. scicrunch.org [2] > >>> <http://scicrunch.org> is the > >>>> post-pilot phase of what is described in that paper. > >>>> > >>>> > >>>> > >>>> Personally, I could not care less, let those catalog providers > >>> fight > >>>> that out among themselves. However, I find that the notion of > >>>> SciCrunch > >>>> clearly identifies the provenance of that information, while > >>> "RRID" to > >>>> me is more of a concept coined by (https://www.force11.org), > >>> not a > >>>> provider. And with several initiatives following the same > >>> purpose, I > >>>> found that by using SciCrunch not RRID, we would be the most > >>>> provider-neutral. And then again, it is only something local > >>> to the > >>>> Debian packaging, not publicly visible, so nobody should truly > >>>> care and > >>>> the use of SciCrunch imho serves us best on a technical level. > >>>> > >>>> > >>>> RRIDs share a single name space that allow for multiple providers, > >>> sci > >>>> crunch being the current main provider for software tools and > >>>> databases and other registries responsible for the other types. By > >>>> referring to RRIDs generically then there is no conflict. > >>>> > >>>> See https://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000558 [3] > >>> for an > >>>> overview > >>>> > >>>> Please rename this field to RRID, or better yet just have a list > >>> of > >>>> URIs like we do in CWL so you don't have to care if it is a RRID, > >>> DOI > >>>> or whatever :-) > >>>> > >>>> http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage > >>> [4] > >>>> > >>> This is what we are doing. The field is called "Registry" (not RRID, > >>> so > >>> we can also refer to Wikis and other catalogs) and allows for an > >>> arbitrary unordered number of (Name, Entry) tupels, in complete > >>> analogy > >>> to the CWL, I tend to think. > >> > >> Well, no. In CWL we don't separate the provider from the identifier. > >> That's the whole point about COOL URIs. > >> I've CC'd Stian as he explains this better than I do. > > > > The problem is that from the applied IDs, only Bio.Tools provide COOL > > URIs. Other providers should, but they don't, at least not yet. Thus a > > provider + ID pair is, unfortunately, necessary. > > > > As a side-track: You mentioned DOIs, Michael. > > Would it make sense if Debian (Med) adds DOIs as citable links to > > upstream releases, in addition to the upstream version and upstream > > repo information? > > And in any case, DOIs for both the upstream project as a whole (i.e. > > all releases), and/or for the particular releases, can be added as > > citations for a src package, if package maintainer or upstream wish so. > > The task page already presents traditional references and at least the > metadata file also harvests their DOIs. I have read that an entry in > OMICtools would also have a DOI, but I admit not to have seen that yet, > and because I am confused enough already, I have not looked for it, either. > > When we now add DOIs that indicate a particular version of the software, > so I admit to embrace that for the nice semantics behind that (like a > feature having been added at a particular point in time) but then again > ... ouch. We will have some individuals that reference a software by its > DOI and others by a version. And how do you express with a DOI that a > particular feature is no longer present? Trivial with a version number > since it is ordered. And now someone tells me that DOIs are also ordered > - they most likely are but close to nobody thinks about it that way, I > presume. > > > > > > > >>> Related to your comment (and very, very close to my heart) is the > >>> question if we do everything sufficiently well to map the CWL > >>> workflows > >>> to Debian packages. We could for instance add references to > >>> CWL-workflow-database-entries for the workflows that a particular > >>> Debian > >>> packages is used in, so we can test them when the package updates - > >>> er, > >>> before the package updates in the distribution. > >> > >> We are good here; you can determine the packages used in any given CWL > >> description that includes a SoftwareRequirement that is mappable > >> directly or indirectly to a package. > >> > >> For automated testing you would need a way to specify "normal" or > >> expected results; CWL v1.0.x doesn't have that concept. A > >> researchobject.org [16] RO that contains/references those results with > >> the corresponding CWL workflow would however fulfill this role. > >> > > I am a fan. Yes, please! RO, go, go, go! Let us complete one. While > writing this down I somehow sensed that I got the dependencies wrong. > So, please correct me. I initially saw: > > * Debian package that features the CWL, happily an auto-created Debian > package from a CWL-database. > > - using only Debian packages to perform the workflow > > - we are a bit weak on the Debianisation of public data that we are > likely to need for those tests > > * Auto-created test(s) for that workflow added from the RO collection. > > * Submitted to Debian as a package > > * Control the results from ci.debian.org > > > I am not so confident that we have a 1:n relationship between CWL > workflows and ROs. It is more like one RO integrating a subset of > workflows, right? This will render it all a bit more complicated, as in > CWL-representing Debian packages depending on each other, but I still > like it. >
A researchobject is a data container or data manifest. It is the recommended method of communicating the results of a CWL workflow. As a single CWL workflow can produces varying outputs based upon varying inputs there can naturally be many different ROs describing those different ouptus. > > How should we name those Debian packages that are auto-created to > represent a CWL workflow? This depends on the database from which we > derive the package, right? Do we agree that the ROs are not appearing as > packages themselves? I do not think it makes sense to package particular workflows as a Debian package; except as shared example workflow to be Recommended by the various CWL runners > > > That is again similar to the elsewhere-discussed proposal of > > generating (and/or linking to) software containers (Docker, > > Singularity, rkt?)... > > We should reference them, too. I am a bit uncertain if we should > distinguish between a software container installs a Debian package and a > container installs a Conda package. What's the value of referencing an outside software container from a Debian package? They are easy enough to make by hand and are soon going to be autogenerated.. > > > Back to the topic: I agree with Steffen that if we mean the link pairs > > as Provider + ID (as opposed to ID_type + ID_value), then SciCrunch > > makes more sense than RRID. > > Nice to hear in the sense that we would not have to change them all back > again. Let us wait to learn if Stian is any opinionated about it. > > @Matúš, I'd like to open another thread on how to proceed with the EDAM > annotation of packages in Debian to help structuring our task pages. My > immediate thought was to just take the Topic and then store the whole > path (there was path, right?) of your ontology until it gets to that > topic as a header for all the tools that share that topic. > > The annoying bit would then be that the upload into the UDD (that Debian > database from which the task page is derived) would need to know about > the EDAM ontology. But that would not scale with all the other Debian > packages of other disciplines. So we would need to somewhere generate > the full path in an automated fashion as a derived field (since you may > reorganise your ontology) or we live with the specifier of the topic > alone. But then we are weaker on the semantics. Ideas welcome. > > Steffen > > > -- Michael R. Crusoe Co-founder & Lead, Common Workflow Language project <http://www.commonwl.org/> https://impactstory.org/u/0000-0002-2961-9670 [email protected] +1 480 627 9108

