Ahh - sorry. I finally found the format specification for BGZF in the SAM 
format specification, and it seems that it is 100% GZIP-compatible. There is 
still the issue of needing an external file index, since all BGZF seems to give 
you is the size of the compressed block, not anything format-specific, like the 
number of sequences in the block.

In any case, whether it's GZIP or BGZF, it seems the solutions are very 
similar, and porting my work should be pretty simple - I just used larger 
blocks and put all the data in the index file and none in the headers.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jdu...@illumina.com


-----Original Message-----
From: Peter Cock [mailto:p.j.a.c...@googlemail.com] 
Sent: Tuesday, November 08, 2011 4:04 PM
To: Duddy, John
Cc: Greg Von Kuster; galaxy-dev@lists.bx.psu.edu; Nate Coraor
Subject: Re: [galaxy-dev] Tool shed and datatypes

On Tue, Nov 8, 2011 at 11:45 PM, Duddy, John <jdu...@illumina.com> wrote:
> It's not public yet, and it involves a little conundrum - we want
> it so we can support large amounts of data efficiently on a variety
> of aligners, including our ELAND from CASAVA. However, ELAND
> does not support unaligned BAM inputs yet, and apparently it
> would be a lot of work to make it so (and another team's area
> of responsibility as well).

OK, so using (unaligned) BAM isn't about to happen.

> So in the near term, BGZF would not meet our needs.
>

I don't follow you there, BAM != BGZF.

We can use BGZF to compress FASTQ, FASTA, GenBank,
basically anything. You get compression approaching that
of plain GZIP (depending on the characteristics of the data)
plus efficient random access.

> However, work is quite far along on a GZIP-based one
> that works with ELAND and BWA, since they both read
> GZIP FASTQ files, and works/will work with a converter
> to fastq_sanger for other tools.
>
> I can put you in touch with the engineer doing the work if
> you are interested.

That might be a good idea, or ask them to post here?

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to