I've been looking at this all day today, and here's what I've figured out.
 The picard_wrapper.py simply puts the SAM header from the input BAM file
at the top of the BED file.  However, the interval file actually has
different columns of the order:
Seq Name, Start Pos (1-based), End Pos, Strand, Interval Name.
whereas the Bed file use the format of
Seq Name, Start Pos (0-based), End Pos, Name, Score, Strand

So the bed file actually needs to be converted and not just have the SAM
header added.  I wonder if the wrapper should NOT be doing this but this
should be a whole different file format.  I see in datatypes_conf.xml that
a picard_interval_list datatype exists, but I'm not sure its entirely
correct either.  Would it be more appropriate to have the user upload a
correctly formatted file or should the wrapper just re-order the BED
columns and add 1 to the start pos?

On Tue, Jan 10, 2012 at 11:24 AM, Ryan Golhar

> In case anyone is interested I posted a message to samtools-dev and got a
> few responses about it.  The thread is called 'Picard bait/target format
> file for HsMetrics'.  Now, for Galaxy, I think the wrapper should not
> accept the BED file as input as that doesn't work.  I like the idea of a
> new file format (picardBaitTarget or maybe picardIntervalList) as the input
> type.
> If the converter tool adds a header to the BED file, then there is the
> possibility that a user can associated the BED file with the wrong version
> of a genome.  This is what Picard was trying to avoid.  But that doesn't
> mean a user can't manually add the wrong header anyway.  If the BED file is
> missing strand information, I don't think the tool should add it.  I would
> say just leave the rest of the file alone.  If there is no strand
> information, perhaps the user doesn't care about the strand.
> On Mon, Jan 9, 2012 at 6:11 PM, Ross <ross.laza...@gmail.com> wrote:
>> Hi Ryan,
>> Yes, the Picard tool mandates a bizarre bait/target format file for
>> reasons which might best be addressed to the Picard devs - they may
>> have some very good reasons although I can't imagine what they are.
>> :)
>> Yes, automated conversion of any valid Galaxy bed dataset into the
>> strange format required by the Picard tool is a very good idea. We're
>> already half way there because the tool wrapper adds the (IMHO really
>> silly) required SAM header automagically.
>> A new datatype (eg "picardBaitTarget") and an automated converter
>> would make the tool much easier to use - it's far from ideal to force
>> Galaxy users to comply with the strange Picard format requirements if
>> we can automate a converter.
>> I thought about implementing one but stopped when I realized that am
>> not sure what an automated converter should do if the user supplies a
>> valid Galaxy bed lacking strand information - generally, making up
>> strand is not a good idea. I don't have enough insight into the way
>> the stats are calculated to know whether bad things might happen if
>> (eg) we assume all the bait and target regions are on the + strand if
>> they're not - but if someone can describe how to automate the
>> conversion, it would definitely be an improvement to the usability of
>> the Picard tool.
>> Suggestions welcomed!
>> On Tue, Jan 10, 2012 at 8:03 AM, Ryan Golhar
>> <ngsbioinformat...@gmail.com> wrote:
>> > Hi all - I think there is a problem with the Picard HSMetrics wrapper in
>> > Galaxy.  The wrapper accepts a BAM files and a BED file.  However the
>> BED
>> > file isn't really in a BED format...it requires a SAM header before the
>> BED
>> > lines.  This really isn't a BED file format.  I'm not quite sure how
>> Galaxy
>> > should deal with this...maybe a file format specific for Picard
>> formatted
>> > BED file.
>> >
>> >
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to