Re: ELIXIR tools registry participation - richer metadata for our packages

Sascha Steinbiss Fri, 21 Nov 2014 04:26:03 -0800

Dear Steffen and Tim,

sounds cool and useful to help people find the right tool for a job in
the long run! I for one would prefer the long representation, with one
entry per binary. If I can find the time, I would even help updating the
metadata files with this kind of information once the format of the
upstream/metadata file you are proposing is stable and documented.


Cheers
Sascha

On 20/11/2014 17:37, Booth, Timothy G. wrote:
> Dear all,
> 
> This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
> of the ELIXIR-DK Catalog of resources in computational biology.
> There is a general excitement of the collection of tools that
> are associated with Debian Med and its derivatives and we are
> here to
> 
>  * help the ELIXIR folks to fill their database
>    - avoid redundancies
>    - render the catalog immediately functional to see Debian packages
>  * help ourselves
>    - gain extra expressiveness in our own descriptions
>      o by adopting the EDAM ontology [1]
>      o to have separate annotations for the packages as a whole
>        and individual tools (selected binaries in /usr/bin)
>    - have some extra visibility
>    - find additional users (bare metal and virtualised)
>    - explain to the world how inviting Debian is to have one's
>      software redistributed
> 
> Catalog entries are meant to be provided by the maintainers of
> the software tools in the ELIXIR network. For resources (binaries)
> provided through the Linux distros, we could certainly just fall
> back to the information we already have, but we would like you
> (this list) to comment on the extension of the
>   debian/upstream/metadata
> file to accommodate also structured references to semantical
> catalogs like the EDAM ontology. There is a related effort by
> Matus to annotate the DebTags. The format we think about is like
> 
> Ontology: http://prefix.of.ontology.org
>  feature_name: ontological_description_of_that_feature
>  another_feature: id<blank>human_readable
>  scope: <list of binaries> | summary
>   feature_name: ...
>   another_feature: ...
> 
> The features may differ between ontologies. We had a look at bowtie to see 
> how it goes and we ended up with:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  scope: summary
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    function: operation_0292 Sequence alignment generation
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: format_2573 SAM
>  scope: bowtie-build
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    output: data_3210 Genome index
>    output: ??? Bowtie index format EBWT
>    output: ??? Bowtie long index format EBTWL
>  scope: bowtie-inspect
>    function: operation_1813 Sequence retrieval
>    function: operation_0304 Metadata retrieval
>    function: operation_0228 Data index analysis
>    input: data_3210 Genome index
>    input: ??? Bowtie index format EBWT
>    input: ??? Bowtie long index format EBTWL
>    output: data_2975 Nucleic acid sequence (raw)
>    output: format_1929 FASTA
>    output: format_1964 plain text format (unformatted)
>  scope: bowtie
>    function: operation_0350 Sequence database search (by sequence using 
> word-based methods)
>    function: operation_0292 Sequence alignment generation
>    input: data_3210 Genome index
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1964 plain text format (unformatted)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: data_0867 Sequence alignment report
>    output: format_2573 SAM
>    output: ??? Bowtie alignment report format
> 
> or if we want to reduce the level of detail to just the summary this could be 
> compressed to:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  function: operation_3212 Genome indexing (Burrows-Wheeler)
>  function: operation_0292 Sequence alignment generation
>  input: data_2975 Nucleic acid sequence (raw)
>  input: format_1929 FASTA
>  input: format_1930 FASTQ
>  output: data_1383 Sequence alignment (nucleic acid)
>  output: format_2573 SAM
> 
> If the list likes this approach, then we can continue annotating a bit more 
> and amend our task pages for it all.
> Some tools and suites (eg. EMBOSS) have existing annotations from other 
> projects that we can inherit.
> We are not yet confident about what this effectively means e.g. for the 
> Ultimate Debian Database. @Charles, can you
> direct us, please?
> 
> Best regards from Copenhagen
> 
> Steffen and Tim
> 
> [1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM
> 
> This message (and any attachments) is for the recipient only. NERC is subject 
> to the Freedom of Information Act 2000 and the contents of this email and any 
> reply you make may be disclosed by NERC unless it is exempt from release 
> under the Act. Any material supplied to NERC may be stored in an electronic 
> records management system.
> 
> 


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]
Archive: https://lists.debian.org/[email protected]

Re: ELIXIR tools registry participation - richer metadata for our packages

Reply via email to