On Sep 2, 2011, at 3:02 PM, Edward Kirton wrote:

>> What, like a BAM file of unaligned reads? Uses gzip compression, and
>> tracks the pairing information explicitly :) Some tools will already take
>> this as an input format, but not all.
> ah, yes, precisely.  i actually think illumina's pipeline produces
> files in this format now.
> wrappers which create a temporary fastq file would need to be created
> but that's easy enough.

My argument against that is the cost of going from BAM -> temp fastq may be 
prohibitive, e.g. the need to generate very large temp fastq files on the fly 
as input for various applications may lead one back to just keeping a permanent 
FASTQ around anyway.  One could probably get better performance out of a 
simpler format that removes most of the 'AM' parts of BAM.  Or is the idea that 
the file itself is modified, like a database?  And how would indexing work (BAM 
uses binning on the match to the reference seq), or does it matter?

I recall hdf5 was planned as an alternate format (PacBio uses it, IIRC), and of 
course there is NCBI's .sra format.  Anyone using the latter two? 


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to