Hello Peter,

On Mar 18, 2011, at 11:26 AM, Peter Cock wrote:

> 
> I've just updated my test Galaxy instance to get the 5221:b5ecb8f4839d
> fix, and I now get a different behaviour - still an error state.
> 
> Data type: auto
> Build: ?
> Miscellaneous information: The uploaded files need grooming, so
> change your Copy data into Galaxy? selection to be Copy files into
> Galaxy instead of Link to files without copying into Galaxy so
> grooming can be performed.
> error
> 
> Presumably Galaxy uses 'Grooming' in several settings (e.g.
> FASTQ) to mean 'data sanitising', and what that message is
> trying to tell me is Galaxy doesn't think my BAM file is sorted
> (and therefore needs 'grooming'). Right?


This is correct.


> 
> Having checked my BAM files with samtools, I can confirm they
> don't have the SO header.
> 
> samtools view -H myfile.bam | grep "SO:"
> 
> They were generated with BWA in a split+merge pipeline to use
> multiple cores. I support I could run samtools reheader on them...
> but it would be nice to avoid that.


Change set 5256:4acde9321b63 now includes more robust checking if a bam file is 
sorted.  If using a version of samtools 0.1.13 or newer, an error condition 
occurs if attempting to index an unsorted bam file.  We take advantage of this 
in our checks.


> 
> Did you see Pierre's little C tool using the samtools API to do this?
> http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using.html


Yes, however in testing, a 6.6GB BAM file took 138 seconds to check with the 
posted 'bamsorted' code that uses the SAMtools API and 128 seconds to index with
SAMtools, so we're using samtools for the check.


> 
>> The only disadvantage is that you need a new samtools for it to
>> work on 100% of cases but that seems like a good choice moving
>> forward.
> 
> Yes, since Galaxy will typically do sort the index anyway, it makes
> sense to try and do the indexing immediately, and thus find out if
> a sort is required or not.
> 
> Meanwhile, the following trivial patch resolves my problem with
> getting pre-existing BAM files loaded into Galaxy:
> 
> https://bitbucket.org/peterjc/galaxy-central/changeset/7f17701740b2
> 
> As a follow up, Galaxy doesn't need to re-index the file if there
> is already a BAI index. However, making it do this seems to mean
> knowing a bit more about how Galaxy deals with its metadata.
> 
> Peter
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/
> 

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu




___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to