Hello all, Although they are these days also offering XML for many tools, the NCBI still make heavy use of the older ASN.1 file format (both as plain text and binary). This crops up in BLAST (e.g. as the BLAST archive format, or as dustmasker output), in the Entrez Utilities (e.g. for sequence data as an alternative to GenBank for FASTA format etc, or pubmed, etc) and also for 3D structures.
I think it could make sense to define generic 'asn1' and 'asn1-binary' formats in the Galaxy core (name suggestions welcome), and even perhaps 'ncbi-asn1' and 'ncbi-asn1-binary' too. Then ToolShed entries can define domain specific subclasses. For instance, the BLAST+ wrapper could include definitions for the dustmasker output, and perhaps the BLAST archive format too. Separately anyone working with 3D structures as ASN.1 could define another sub-format, etc. I see this as a clear analogy to the assorted XML file formats in existence - defined in Galaxy as subclasses of the core XML format included with the Galaxy core. Would a pull request implementing this be acceptable? Peter P.S. Does anyone know an authoritative source for the MIME types used by the NCBI? Using the BLAST website they offer plain text ASN.1 just as text/plain, likewise efetch also seems to use text/plain for ASN.1 downloads. However I've seen references to chemical/ncbi-asn1-ascii and chemical/ncbi-asn1-binary mime-types mentioned, e.g. http://www.ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn i.e. It appears that 3D structure NCBI ASN.1 files use a well defined MIME type, while most NCBI ASN.1 text files default to text/plain - which we can handle nicely in Galaxy as subclasses. ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/