Hello all,

Although they are these days also offering XML for many tools,
the NCBI still make heavy use of the older ASN.1 file format
(both as plain text and binary). This crops up in BLAST (e.g.
as the BLAST archive format, or as dustmasker output), in
the Entrez Utilities (e.g. for sequence data as an alternative
to GenBank for FASTA format etc, or pubmed, etc) and also
for 3D structures.

I think it could make sense to define generic 'asn1' and
'asn1-binary' formats in the Galaxy core (name suggestions
welcome), and even perhaps 'ncbi-asn1' and 'ncbi-asn1-binary'
too. Then ToolShed entries can define domain specific
subclasses. For instance, the BLAST+ wrapper could include
definitions for the dustmasker output, and perhaps the BLAST
archive format too. Separately anyone working with 3D
structures as ASN.1 could define another sub-format, etc.

I see this as a clear analogy to the assorted XML file formats
in existence - defined in Galaxy as subclasses of the core
XML format included with the Galaxy core.

Would a pull request implementing this be acceptable?

Peter

P.S. Does anyone know an authoritative source for the MIME
types used by the NCBI? Using the BLAST website they
offer plain text ASN.1 just as text/plain, likewise efetch also
seems to use text/plain for ASN.1 downloads. However I've
seen references to chemical/ncbi-asn1-ascii and
chemical/ncbi-asn1-binary mime-types mentioned, e.g.
http://www.ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn

i.e. It appears that 3D structure NCBI ASN.1 files use
a well defined MIME type, while most NCBI ASN.1 text
files default to text/plain - which we can handle nicely in
Galaxy as subclasses.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to