Hi all,
 
I appreciate all of the discussion related to this issue. I still don't 
understand why I should only see this issue when I choose the hg_g1k_v37 format 
but not when I choose the Hg_19 format? I realize that I would need to ensure 
that the Bam files are sorted correctly before I enter the GATK pipline, but 
all of this is before that process.
 
When my read files are processed through to .bam files using the hg_19 format, 
I can view them in IGV without a problem. It is only when I use the hg_g1k_v37 
format that I receive an error from IGV. It seems to me that the process that I 
am using in Galaxy should be identical except for the reference genome format 
(i.e. hg_19 or hg_g1k_v37).
 
I am at a loss of how to proceed. Does anyone have ideas?
 
Thanks,
Mike



--- On Thu, 10/27/11, Jim Robinson <jrobi...@broadinstitute.org> wrote:


From: Jim Robinson <jrobi...@broadinstitute.org>
Subject: Re: [galaxy-user] Problem with bam and/or bai files
To: "Peter Cock" <p.j.a.c...@googlemail.com>
Cc: "Galaxy Dev" <galaxy-...@bx.psu.edu>, "Mike Dufault" <dufau...@yahoo.com>, 
"galaxy-user" <galaxy-u...@lists.bx.psu.edu>
Date: Thursday, October 27, 2011, 9:58 AM


  Its possible the sorting problem was a specific version and now gives 
an error.  The incorrect index caused by bad sequence lengths is a 
recurrent problem, but I do not know what tool produces such headers.  
Perhaps someone who has experienced this can chime in.

I'm not a samtools expert just sharing my experience on what has caused 
this error int the past.   It does seem that, as a general rule,  that 
these index problems result in errors from Picard (which the GATK uses), 
while samtools can fail silently and sometimes and give you an unrelated 
query region.

Jim

> Sending to galaxy-dev ...
>
> On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
> <jrobi...@broadinstitute.org>  wrote:
>> Hi Mike,
>>
>> Someone from the Galaxy team can perhaps give some insight on
>> what went wrong,  I can comment on the error message from IGV.
>> That error is thrown from Picard, in every case I've investigated so
>> far it was traced to a problem with the index.
> Useful background re: "Error reading bam file. This usually indicates
> a problem with the index (bai) file. ArrayIndexOutofBoundsException:
> 4682 (4682)."
>
>> The most common causes are (1) a problem with the sequence
>> dictionary in the BAM header itself, specifically incorrect sequence
>> lengths,
> Any idea what tools produce that kind of thing?
>
>> and (2) indexing an un-sorted BAM.  Apparently samtools will
>> make invalid indexes from such files without any complaints in
>> both cases.  You can even use samtools tview on such files,
>> it happily will show you some random region when you query.
> That is news to me - I recall "samtools index" being recommended
> as a way to determine if a BAM files was sorted or not (error on
> unsorted, you get an index if it was sorted) and again from
> memory this is what Galaxy uses internally as part of preparing
> BAM files on upload.
>
> Might this be tied to a specific version of samtools? e.g. a
> possible regression?
>

>> I don't see a "Sort" step in your workflow, maybe that's the problem?
>>
>> Please CC me on any reply,  I might miss it in the list.
>>
>> Jim
> Thanks,
>
> Peter

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to