Hi Peter,

On Nov 18, 2013, at 10:33 AM, Peter Cock <p.j.a.c...@googlemail.com> wrote:

> On Mon, Nov 18, 2013 at 2:24 PM, Dave Bouvier <d...@bx.psu.edu> wrote:
>> Peter,
>> 
>> It turns out there were two problems. First, the test environment was not
>> resolving the upload tool's dependency on samtools, which I've now
>> corrected.
> 
> Excellent.
> 
> On a closely related point, I understand Galaxy likes to store all
> BAM files co-ordinate sorted and indexed - when a tool produces
> a BAM file where does this happen? i.e. Is it the individual tool's
> responsibility, or the framework (e.g. during setting metadata).
> I am assume the later, in which case is there still an implicit
> samtools dependency there?

This is (unfortunately) performed in multiple methods in the Bam class methods 
in ~/galaxy/datatypes/binary.py.  There are some comments (pasted here) that 
include an old "TODO" in the Bam class's dataset_content_needs_grooming() 
method that clarifies some of the reasons for this:

            # Samtools version 0.1.13 or newer produces an error condition when 
attempting to index an
            # unsorted bam file - see 
http://biostar.stackexchange.com/questions/5273/is-my-bam-file-sorted.
            # So when using a newer version of samtools, we'll first check if 
the input BAM file is sorted
            # from the header information.  If the header is present and 
sorted, we do nothing by returning False.
            # If it's present and unsorted or if it's missing, we'll index the 
bam file to see if it produces the
            # error.  If it does, sorting is needed so we return True 
(otherwise False).
            #
            # TODO: we're creating an index file here and throwing it away.  We 
then create it again when
            # the set_meta() method below is called later in the job process.  
We need to enhance this overall
            # process so we don't create an index twice.  In order to make it 
worth the time to implement the
            # upload tool / framework to allow setting metadata from directly 
within the tool itself, it should be
            # done generically so that all tools will have the ability.  In 
testing, a 6.6 gb BAM file took 128
            # seconds to index with samtools, and 45 minutes to sort, so 
indexing is relatively inexpensive.

> 
>> Second, the bam file detection on upload was broken due to the
>> bug in python 2.7.4's gzip module, which I've also corrected.
> 
> You mean http://bugs.python.org/issue17666 fixed in 2.7.5?

Yes

> 
> I reported that when Biopython's BGZF support broke (BGZF
> being the gzip flavour used for BAM and tabix style indexed files).

Thanks!

> 
>> I have re-run the test framework on samtools_idxstats, and it has
>> now passed its test.
>> 
>>   --Dave B.
> 
> Thanks Dave :)
> 
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to