Peter and Greg; > > 2. Determine if a BAM file is sorted before it is introduced into the > > Galaxy environment so that it will only be sorted if necessary. We have > > a very simple test for this in the Bam class's _is_coordinate_sorted(0 > > method in ~/lib/galaxy/datatypes/binary.py, but this method obviously > > needs improvements. The improved implementation is a bit non-trivial, > > but it is high priority, so should be completed soon. In the meantime, > > Bam files cannot be uploaded to a data library using the combinations > > of options described in 1 above if they do not pass the current simple, > > rigid test in the Bam class's method.
> I was thinking about this over the weekend, and perhaps you could > assume (for the special case of a library import where the file is being > linked to) that if the BAI index file already exists then the BAM file > should be sorted already. i.e. Use both the BAM and BAI files as > provided. I added in that initial sorted test and agree that it is imperfect. Several tools sort the files but do not set the SO: header since it's not required by the spec. We recently had a discussion about this: http://biostar.stackexchange.com/questions/5273/is-my-bam-file-sorted I believe the new 0.1.13 samtools has the fixes Heng mentioned in the comments thread so a good process to check for sorting is to do 'samtools index your.bam' and check the error code. It will complain for non-sorted files. The only disadvantage is that you need a new samtools for it to work on 100% of cases but that seems like a good choice moving forward. Brad ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/