Hi Hervé, thanks for your explanation. When I wrote my previous mail in response to Steffen I have not yet read this since I'm reading usually thread-based.
On Wed, Feb 04, 2015 at 08:46:07AM +0000, Hervé Ménager wrote: > Dear all, > > As some of you already know, the ELIXIR registry ( > http://elixir-registry.cbs.dtu.dk) is a project that aims at gathering an > extensive list of bioinformatics tools and services, and publish them as a > web-based database where users can search and locate relevant > bioinformatics resources. There is a work, initiated by Steffen Möller and > Tim Booth, to enable the automatic registration of debian-med packages as > resources in this registry. I worked a bit with Steffen and Tim on this > interface during the debian-med sprint. Thanks for working on this. > The way it is currently done a new metadata file, called edam (for now), in > the "upstream" directory of the package source, contains additional > information not present in other files such as metadata, control, etc. The > script parses all these files to produce a JSON file used to register the > debian package in the registry. However, there are, as Andreas pointed out > today, at least two problems with this approach: > - it includes the creation of an additional file to store information, > potentially breaking downstream operations where this file is unexpected, I think this is not a problem. As Charles said in my response he just needed to do a "minimum diff upload" to get an RC bug fix accepted by the release team. Dumping another file into debian/upstream would otherwise be considered harmless. My major point was that the effort itself will be more successful if you point people to it in advance to let them contribute to this effort as well. > - it requires parsing multiple files with different formats, long and > cumbersome task... > As a complete stranger to the debian packaging process, I would like > your opinion on these points: > - should we create this additional file or add the information in > another existing files? The goal here is obviously to reduce the > number of files which have to be edited, while minimizing the risk of > breaking anything in the packaging architecture. The debian/upstream dir itself is quite new and not yet used by many teams. Charles started it for injecting publication data and over time and is documented in the Wiki[1]. I'm personally not sure whether we should invent a new file (edam) there or whether it is fine to use the just existing metadata file for this kind of information. It mainly depends from the planed application and the way it should be maintained. > - should we try, rather than parsing these files, to retrieve > the information from the UDD? I'd personally prefer this option to the > "parse n files" one, but it would also require to add the new information > to the DB. At some point in time the n files need to be parsed. However, this is a solved problem for debian/upstream/metadata. For the moment I just extract the Reference data from it but I also intend to take over fields Cite-As, Funding and others. The decision what field is parsed from my point of view is application-driven: I needed to put the ciatation data (field "Reference") online on the tasks pages and thus I spendet my time to do the needed work. So if we have a reasonable application for further data we should invent a sensible table layout and import these data. From my point of view we can thus put edam data right into the metadata file (*after* documenting it on the Wiki page[1]) or we can add another file (*and* create an according Wiki page). It would be simple to gather also these additional files in the same job as other machine readable files are processed. What continuosely remains unclear to me is for what purpose we gather these data. The following random questions are popping up in my mind: 0. Is it just fun to collect metadata? 1. Do we just gather them to help the EDAM database get even more metadata than we have (like descriptions, dependencies, etc.)? That's fine but than we should provide them in the best possible form *for* EDAM to be accessed (whatever this might be). 2. Do we want to base installation methods on a certain set of EDAM fields? (I remember times when it was possible to install packages based on DebTags but I can't find this any more :-() 3. Do we want to change our Debian Med task design on EDAM tags? I think we should made up our mind what exactly we want to approach to finally enhance the user experience. > Thanks a lot in advance for sharing your remarks and opinion on this. Thanks also to you and your contribution to the sprint. It was nice to learn you to know. Kind regards Andreas. [1] https://wiki.debian.org/UpstreamMetadata -- http://fam-tille.de -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: https://lists.debian.org/[email protected]

