Dear Steffen and Tim, sounds cool and useful to help people find the right tool for a job in the long run! I for one would prefer the long representation, with one entry per binary. If I can find the time, I would even help updating the metadata files with this kind of information once the format of the upstream/metadata file you are proposing is stable and documented.
Cheers Sascha On 20/11/2014 17:37, Booth, Timothy G. wrote: > Dear all, > > This is Steffen and Tim sharing a desk at the Copenhagen Hackathon > of the ELIXIR-DK Catalog of resources in computational biology. > There is a general excitement of the collection of tools that > are associated with Debian Med and its derivatives and we are > here to > > * help the ELIXIR folks to fill their database > - avoid redundancies > - render the catalog immediately functional to see Debian packages > * help ourselves > - gain extra expressiveness in our own descriptions > o by adopting the EDAM ontology [1] > o to have separate annotations for the packages as a whole > and individual tools (selected binaries in /usr/bin) > - have some extra visibility > - find additional users (bare metal and virtualised) > - explain to the world how inviting Debian is to have one's > software redistributed > > Catalog entries are meant to be provided by the maintainers of > the software tools in the ELIXIR network. For resources (binaries) > provided through the Linux distros, we could certainly just fall > back to the information we already have, but we would like you > (this list) to comment on the extension of the > debian/upstream/metadata > file to accommodate also structured references to semantical > catalogs like the EDAM ontology. There is a related effort by > Matus to annotate the DebTags. The format we think about is like > > Ontology: http://prefix.of.ontology.org > feature_name: ontological_description_of_that_feature > another_feature: id<blank>human_readable > scope: <list of binaries> | summary > feature_name: ... > another_feature: ... > > The features may differ between ontologies. We had a look at bowtie to see > how it goes and we ended up with: > > Ontology: http://edamontology.org > topic: topic_0622 Genomics > scope: summary > function: operation_3212 Genome indexing (Burrows-Wheeler) > function: operation_0292 Sequence alignment generation > input: data_2975 Nucleic acid sequence (raw) > input: format_1929 FASTA > input: format_1930 FASTQ > output: data_1383 Sequence alignment (nucleic acid) > output: format_2573 SAM > scope: bowtie-build > function: operation_3212 Genome indexing (Burrows-Wheeler) > input: data_2975 Nucleic acid sequence (raw) > input: format_1929 FASTA > output: data_3210 Genome index > output: ??? Bowtie index format EBWT > output: ??? Bowtie long index format EBTWL > scope: bowtie-inspect > function: operation_1813 Sequence retrieval > function: operation_0304 Metadata retrieval > function: operation_0228 Data index analysis > input: data_3210 Genome index > input: ??? Bowtie index format EBWT > input: ??? Bowtie long index format EBTWL > output: data_2975 Nucleic acid sequence (raw) > output: format_1929 FASTA > output: format_1964 plain text format (unformatted) > scope: bowtie > function: operation_0350 Sequence database search (by sequence using > word-based methods) > function: operation_0292 Sequence alignment generation > input: data_3210 Genome index > input: data_2975 Nucleic acid sequence (raw) > input: format_1964 plain text format (unformatted) > input: format_1929 FASTA > input: format_1930 FASTQ > output: data_1383 Sequence alignment (nucleic acid) > output: data_0867 Sequence alignment report > output: format_2573 SAM > output: ??? Bowtie alignment report format > > or if we want to reduce the level of detail to just the summary this could be > compressed to: > > Ontology: http://edamontology.org > topic: topic_0622 Genomics > function: operation_3212 Genome indexing (Burrows-Wheeler) > function: operation_0292 Sequence alignment generation > input: data_2975 Nucleic acid sequence (raw) > input: format_1929 FASTA > input: format_1930 FASTQ > output: data_1383 Sequence alignment (nucleic acid) > output: format_2573 SAM > > If the list likes this approach, then we can continue annotating a bit more > and amend our task pages for it all. > Some tools and suites (eg. EMBOSS) have existing annotations from other > projects that we can inherit. > We are not yet confident about what this effectively means e.g. for the > Ultimate Debian Database. @Charles, can you > direct us, please? > > Best regards from Copenhagen > > Steffen and Tim > > [1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM > > This message (and any attachments) is for the recipient only. NERC is subject > to the Freedom of Information Act 2000 and the contents of this email and any > reply you make may be disclosed by NERC unless it is exempt from release > under the Act. Any material supplied to NERC may be stored in an electronic > records management system. > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: https://lists.debian.org/[email protected]

