>> Greg wrote:
>> > Breaking this issue into the following 2 parts, here is the status.
>> >
>> > 1. Don't alter the contents of files being uploaded to a data library
>> > if using the "upload_directory" or "upload_paths" options in
>> > conjunction with the "Link to files without copying into Galaxy"
>> > option.  This issue has been resolved in change set 5221:b5ecb8f4839d.

I've just updated my test Galaxy instance to get the 5221:b5ecb8f4839d
fix, and I now get a different behaviour - still an error state.

Data type: auto
Build: ?
Miscellaneous information: The uploaded files need grooming, so
change your Copy data into Galaxy? selection to be Copy files into
Galaxy instead of Link to files without copying into Galaxy so
grooming can be performed.
error

Presumably Galaxy uses 'Grooming' in several settings (e.g.
FASTQ) to mean 'data sanitising', and what that message is
trying to tell me is Galaxy doesn't think my BAM file is sorted
(and therefore needs 'grooming'). Right?

On Tue, Mar 15, 2011 at 2:39 PM, Brad Chapman <chapm...@50mail.com> wrote:
> Peter and Greg;
>
>> > 2. Determine if a BAM file is sorted before it is introduced into the
>> > Galaxy environment so that it will only be sorted if necessary.  We have
>> > a very simple test for this in the Bam class's _is_coordinate_sorted(0
>> > method in ~/lib/galaxy/datatypes/binary.py, but this method obviously
>> > needs improvements.  The improved implementation is a bit non-trivial,
>> > but it is high priority, so should be completed soon.  In the meantime,
>> > Bam files cannot be uploaded to a data library using the combinations
>> > of options described in 1 above if they do not pass the current simple,
>> > rigid test in the Bam class's method.
>>
>> I was thinking about this over the weekend, and perhaps you could
>> assume (for the special case of a library import where the file is being
>> linked to) that if the BAI index file already exists then the BAM file
>> should be sorted already. i.e. Use both the BAM and BAI files as
>> provided.
>
> I added in that initial sorted test and agree that it is imperfect.
> Several tools sort the files but do not set the SO: header since
> it's not required by the spec.

Having checked my BAM files with samtools, I can confirm they
don't have the SO header.

samtools view -H myfile.bam | grep "SO:"

They were generated with BWA in a split+merge pipeline to use
multiple cores. I support I could run samtools reheader on them...
but it would be nice to avoid that.

> We recently had a discussion about this:
>
> http://biostar.stackexchange.com/questions/5273/is-my-bam-file-sorted
>
> I believe the new 0.1.13 samtools has the fixes Heng mentioned in
> the comments thread so a good process to check for sorting is to do
> 'samtools index your.bam' and check the error code. It will complain
> for non-sorted files.

Did you see Pierre's little C tool using the samtools API to do this?
http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using.html

> The only disadvantage is that you need a new samtools for it to
> work on 100% of cases but that seems like a good choice moving
> forward.

Yes, since Galaxy will typically do sort the index anyway, it makes
sense to try and do the indexing immediately, and thus find out if
a sort is required or not.

Meanwhile, the following trivial patch resolves my problem with
getting pre-existing BAM files loaded into Galaxy:

https://bitbucket.org/peterjc/galaxy-central/changeset/7f17701740b2

As a follow up, Galaxy doesn't need to re-index the file if there
is already a BAI index. However, making it do this seems to mean
knowing a bit more about how Galaxy deals with its metadata.

Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to