On 19/07/2012 05:17, Charles Plessy wrote:
Dear EMBOSS developers,

today I received the following bug report about the quantity of data shipped in
the Debian package for EMBOSS.

emboss-data recently grew from a slim 5 megabytes to a massive 305.
Closer inspection reveals the primary culprits to be large taxonomy
and gene ontology databases:

In Debian, one solution would be to transfer this data in a separate optional
package.  But before doing so, I would like to ask you if this data really
oughts to be distributed with EMBOSS ?  After all, for many other databases,
there are scripts to download and index the data after installation.  Will
EMBOSS 6.5 ship the taxonomy and gene ontology databases as well ?

They are included in the release which appeared on 15th July (announcement in preparation).

For developers the data is updated by rsync ... we could provide scripts to upload and index the data at the end of installation though I found in preparing the release that two ontologies had moved in the last year so that is error-prone.

Some EMBOSS applications assume these databases are installed, particularly EDAM and the NCBI taxonomy. EDAM is used for all metadata, the taxonomy for organism searches in data retrieval. The Gene Ontology is included in analysis of GO terms in metadata.

So if they are in an optional package ... some things will not work if it is not installed. EDAM I would say is essential.

regards,

Peter Rice
EMBOSS Team

_______________________________________________
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Reply via email to