Re: ELIXIR tools registry participation - richer metadata for our packages

Andreas Tille Fri, 12 Feb 2016 23:48:07 -0800

Hi,

I'd like to add to this old thread some results of a later meeting in
Copenhagen - our Debian Med sprint last week kindly sponsored by DTU.


I confirm that I took over edam data files in Debian Med packages into
UDD.  You can easily query all these data by a script I provided on
Github:

    https://github.com/bio-tools/biotoolsConnect/blob/master/DebianMed/edam.sh

Feel free to run this script on any Linux machine with a psql client.

In case you are lacking any such machine but you are a member of the
Debian Med team you can do

    rsync edam.sh alioth.debian.org:
    ssh alioth.debian.org
    ./edam.sh

I'd be happy if some of the EDAM people could confirm that this works
for them.

Kind regards

      Andreas.

On Thu, Nov 20, 2014 at 05:37:10PM +0000, Booth, Timothy G. wrote:
> Dear all,
> 
> This is Steffen and Tim sharing a desk at the Copenhagen Hackathon
> of the ELIXIR-DK Catalog of resources in computational biology.
> There is a general excitement of the collection of tools that
> are associated with Debian Med and its derivatives and we are
> here to
> 
>  * help the ELIXIR folks to fill their database
>    - avoid redundancies
>    - render the catalog immediately functional to see Debian packages
>  * help ourselves
>    - gain extra expressiveness in our own descriptions
>      o by adopting the EDAM ontology [1]
>      o to have separate annotations for the packages as a whole
>        and individual tools (selected binaries in /usr/bin)
>    - have some extra visibility
>    - find additional users (bare metal and virtualised)
>    - explain to the world how inviting Debian is to have one's
>      software redistributed
> 
> Catalog entries are meant to be provided by the maintainers of
> the software tools in the ELIXIR network. For resources (binaries)
> provided through the Linux distros, we could certainly just fall
> back to the information we already have, but we would like you
> (this list) to comment on the extension of the
>   debian/upstream/metadata
> file to accommodate also structured references to semantical
> catalogs like the EDAM ontology. There is a related effort by
> Matus to annotate the DebTags. The format we think about is like
> 
> Ontology: http://prefix.of.ontology.org
>  feature_name: ontological_description_of_that_feature
>  another_feature: id<blank>human_readable
>  scope: <list of binaries> | summary
>   feature_name: ...
>   another_feature: ...
> 
> The features may differ between ontologies. We had a look at bowtie to see 
> how it goes and we ended up with:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  scope: summary
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    function: operation_0292 Sequence alignment generation
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: format_2573 SAM
>  scope: bowtie-build
>    function: operation_3212 Genome indexing (Burrows-Wheeler)
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1929 FASTA
>    output: data_3210 Genome index
>    output: ??? Bowtie index format EBWT
>    output: ??? Bowtie long index format EBTWL
>  scope: bowtie-inspect
>    function: operation_1813 Sequence retrieval
>    function: operation_0304 Metadata retrieval
>    function: operation_0228 Data index analysis
>    input: data_3210 Genome index
>    input: ??? Bowtie index format EBWT
>    input: ??? Bowtie long index format EBTWL
>    output: data_2975 Nucleic acid sequence (raw)
>    output: format_1929 FASTA
>    output: format_1964 plain text format (unformatted)
>  scope: bowtie
>    function: operation_0350 Sequence database search (by sequence using 
> word-based methods)
>    function: operation_0292 Sequence alignment generation
>    input: data_3210 Genome index
>    input: data_2975 Nucleic acid sequence (raw)
>    input: format_1964 plain text format (unformatted)
>    input: format_1929 FASTA
>    input: format_1930 FASTQ
>    output: data_1383 Sequence alignment (nucleic acid)
>    output: data_0867 Sequence alignment report
>    output: format_2573 SAM
>    output: ??? Bowtie alignment report format
> 
> or if we want to reduce the level of detail to just the summary this could be 
> compressed to:
> 
> Ontology: http://edamontology.org
>  topic: topic_0622 Genomics
>  function: operation_3212 Genome indexing (Burrows-Wheeler)
>  function: operation_0292 Sequence alignment generation
>  input: data_2975 Nucleic acid sequence (raw)
>  input: format_1929 FASTA
>  input: format_1930 FASTQ
>  output: data_1383 Sequence alignment (nucleic acid)
>  output: format_2573 SAM
> 
> If the list likes this approach, then we can continue annotating a bit more 
> and amend our task pages for it all.
> Some tools and suites (eg. EMBOSS) have existing annotations from other 
> projects that we can inherit.
> We are not yet confident about what this effectively means e.g. for the 
> Ultimate Debian Database. @Charles, can you
> direct us, please?
> 
> Best regards from Copenhagen
> 
> Steffen and Tim
> 
> [1] EDAM ontology - http://bioportal.bioontology.org/ontologies/EDAM
> 
> This message (and any attachments) is for the recipient only. NERC is subject 
> to the Freedom of Information Act 2000 and the contents of this email and any 
> reply you make may be disclosed by NERC unless it is exempt from release 
> under the Act. Any material supplied to NERC may be stored in an electronic 
> records management system.
> 
> 
> --
> To UNSUBSCRIBE, email to [email protected]
> with a subject of "unsubscribe". Trouble? Contact [email protected]
> Archive: 
> https://lists.debian.org/8c33d34d431a974eaea913101dda19440781235...@nerckwmbc.ad.nerc.ac.uk
> 
> 

-- 
http://fam-tille.de

Re: ELIXIR tools registry participation - richer metadata for our packages

Reply via email to