Hello all,

What is the current status in Galaxy for supporting compressed files?

We've talked about this before, for example in addition to FASTQ,
many of us have expressed a wish to work with gzipped FASTQ.
I understand that some have customized their local Galaxy
installations to use gzipped FASTQ as a specific data type - I'm
more interested in a general file format neutral solution.

Also, I'd like to be able to used BGZF (not just GZIP) because it
is better for random access - see for example
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
- and makes it much easier to break up large datafiles for sharing
over a cluster (i.e. it could be exploited in the current Galaxy code
for splitting large sequence files).

The 11 May 2012 Galaxy Development News Brief
http://lists.bx.psu.edu/pipermail/galaxy-dev/2012-May/009757.html
mentions tabix indexing - that uses bgzip, so is there something
general in place yet to allow tool wrappers to say they accept not just
given file formats, but different compressed versions of file formats?

Ideally I'd like to be able to write an XML tool description saying
a tool produced BGZF compressed tabular data, or GZIP
compressed Sanger FASTQ etc. Similarly, I'd like to specify my
tool accepts FASTA or gzipped FASTA (including BGZF FASTA).
While for older tools if they say they accept only uncompressed
FASTA, Galaxy could automatically decompress any compressed
FASTA entries in my history on demand.

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to