Hello Andreas, > Hi Hervé, > > thanks for your explanation. When I wrote my previous mail in response > to Steffen I have not yet read this since I'm reading usually > thread-based.
I had also replied to the initial thread while Hervé's introduction came arrived in my inbox :) > On Wed, Feb 04, 2015 at 08:46:07AM +0000, Hervé Ménager wrote: > > Dear all, > > > > As some of you already know, the ELIXIR registry ( > > http://elixir-registry.cbs.dtu.dk) is a project that aims at gathering an > > extensive list of bioinformatics tools and services, and publish them as a > > web-based database where users can search and locate relevant > > bioinformatics resources. There is a work, initiated by Steffen Möller and > > Tim Booth, to enable the automatic registration of debian-med packages as > > resources in this registry. I worked a bit with Steffen and Tim on this > > interface during the debian-med sprint. > > Thanks for working on this. And please also all have a look at https://lists.debian.org/debian-med/2014/11/msg00070.html > > The way it is currently done a new metadata file, called edam (for now), in > > the "upstream" directory of the package source, contains additional > > information not present in other files such as metadata, control, etc. The > > script parses all these files to produce a JSON file used to register the > > debian package in the registry. However, there are, as Andreas pointed out > > today, at least two problems with this approach: > > - it includes the creation of an additional file to store information, > > potentially breaking downstream operations where this file is unexpected, > > I think this is not a problem. As Charles said in my response he just > needed to do a "minimum diff upload" to get an RC bug fix accepted by > the release team. Dumping another file into debian/upstream would > otherwise be considered harmless. My major point was that the effort > itself will be more successful if you point people to it in advance to > let them contribute to this effort as well. The placing was yet only technically motivated, not socially, and, yes, of course, sure, certainly, ... > > - it requires parsing multiple files with different formats, long and > > cumbersome task... > > As a complete stranger to the debian packaging process, I would like > > your opinion on these points: > > - should we create this additional file or add the information in > > another existing files? The goal here is obviously to reduce the > > number of files which have to be edited, while minimizing the risk of > > breaking anything in the packaging architecture. > > The debian/upstream dir itself is quite new and not yet used by many > teams. Charles started it for injecting publication data and over time > and is documented in the Wiki[1]. I'm personally not sure whether we > should invent a new file (edam) there or whether it is fine to use the > just existing metadata file for this kind of information. It mainly > depends from the planed application and the way it should be maintained. I expressed my strong preference for the separate file. At the sprint I supported the concept to have an option to have multiple such files when there are multiple packages that differ significantly in what these provide. > > - should we try, rather than parsing these files, to retrieve > > the information from the UDD? I'd personally prefer this option to the > > "parse n files" one, but it would also require to add the new information > > to the DB. > > At some point in time the n files need to be parsed. However, this > is a solved problem for debian/upstream/metadata. For the moment I > just extract the Reference data from it but I also intend to take over > fields Cite-As, Funding and others. The decision what field is parsed > from my point of view is application-driven: I needed to put the > ciatation data (field "Reference") online on the tasks pages and thus > I spendet my time to do the needed work. So if we have a reasonable > application for further data we should invent a sensible table layout > and import these data. From my point of view we can thus put edam > data right into the metadata file (*after* documenting it on the Wiki > page[1]) or we can add another file (*and* create an according Wiki > page). It would be simple to gather also these additional files in > the same job as other machine readable files are processed. Only now after the sprint something emerges that can be documented, IMHO. > What continuosely remains unclear to me is for what purpose we gather > these data. The following random questions are popping up in my mind: > > 0. Is it just fun to collect metadata? The EDAM to me is a simplistic language to describe what our packages are capable to help with. It is somewhat rewarding to prepare such a formal description ... but only for the first few packages. The larger motivation lies in using those terms to describe workflows and then find tools for the job - to actually chain those tools up with the correct command line options to process the data properly is yet another task. > 1. Do we just gather them to help the EDAM database get even more > metadata than we have (like descriptions, dependencies, etc.)? > That's fine but than we should provide them in the best possible > form *for* EDAM to be accessed (whatever this might be). Our subversion and git repositories, or the source packages, are perfectly acceptable. > 2. Do we want to base installation methods on a certain set of > EDAM fields? (I remember times when it was possible to install > packages based on DebTags but I can't find this any more :-() Yes. That and I envision containers (VMs, Docker, Cloud instances) to exploit the annotation. > 3. Do we want to change our Debian Med task design on EDAM tags? The consistency across our blends is more important than any fancy gimmicks, I tend to think. But if it fits - I would not mind. But hoping for a suitable presentation of the availability of Debian packages for particular tasks/software at a central page like the ELIXIR catalog would help us more than any fiddling with our package presentation. > I think we should made up our mind what exactly we want to approach > to finally enhance the user experience. My prime ambition is to come straight from the external-to-Debian catalog of software in computational biology to "us" - and this shall mean Bio-Linux, Ubuntu and Debian alike. A landing page for any such external pointer to a single or multiple Debian packages we still need to decide upon - an early stage could be the "apt-get install" command plus descriptions of our packages to return, or a pointer to a VM featuring those packages, or ... Many options I see. Best, Steffen -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: https://lists.debian.org/trinity-24ceebbe-c21e-429c-9fde-c4bfc4cb892b-1423056162434@3capp-gmx-bs67

