Hi Rathi,
If there were any way that you could get SAM/BAM or a format that could be
translated into SAM/BAM, that would be better since we could display any
mismatches between the read sequences and the reference genome. With BED, we
can show the reference genome coords from your data, but not the read sequences
unless they are concatenated to the IDs which would make them quite long for
display.
BTW I noticed that when a read is split across a junction, the two portions
seem to sum to 38 bases not 36 as for the unsplit reads, e.g.:
14954311,
14954346,
-- assuming 1-based coordinates, 14954311..14954346 = 36 bases like read
sequence
14954338,14956285,
14954362,14956297,
-- assuming 1-based coordinates, first block is 14954338..14954362 = 25 bases,
second block is 14956285..14956297 = 13 bases, but read sequence is 36 bases
long so it should probably be 24 + 12
Here is a simple perl script that translates your format into BED12 (you could
modify this to correct for off-by-one on split reads):
--------------------------------------------
usr/bin/env perl
use warnings;
use strict;
while (<>) {
chomp;
my ($ID, $seq1, $seq2, $chrom, $starts1, $stops1, $starts2, $stops2) =
split("\t");
my @starts1 = split(',', $starts1); my @stops1 = split(',', $stops1);
my @starts2 = split(',', $starts2); my @stops2 = split(',', $stops2);
my $blockCount = @starts1 + @starts2;
my $blockStarts = my $blockSizes = '';
my ($strand, @starts, @stops);
if ($starts1[0] > $starts2[0]) {
$strand = '-';
@starts = (@starts2, @starts1);
@stops = (@stops2, @stops1);
} else {
$strand = '+';
@starts = (@starts1, @starts2);
@stops = (@stops1, @stops2);
}
my $start = $starts[0] - 1;
my $end = $stops[$blockCount-1];
for (my $i = 0; $i < $blockCount; $i++) {
$starts[$i] = $starts[$i] - 1;
$blockStarts .= ($starts[$i]) . ',';
$blockSizes .= ($stops[$i] - $starts[$i]) . ',';
}
my $score = 0;
my $itemRgb = '0,0,0';
print join("\t", $chrom, $start, $end, $ID, $score, $strand, $start, $end,
$itemRgb, $blockCount, $blockStarts, $blockSizes) . "\n";
}
--------------------------------------------
After translating to BED12, use our bedToBigBed program to make a bigBed custom
track file.
Hope that helps, and if you have more questions please send them to us at
[email protected].
Angie
----- "Rathi Thiagarajan" <[email protected]> wrote:
> From: "Rathi Thiagarajan" <[email protected]>
> To: "Mary Goldman" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, May 18, 2011 5:26:23 PM GMT -08:00 US/Canada Pacific
> Subject: Re: [Genome] Visualizing Paired-End Junctions
>
> Hi Mary,
>
> Thanks for your reply. I tried running the soap2sam.pl script however I
> think because the SOAP output that I was given has been modified from the
> original SOAP output (i.e. my files). Therefore, I was wondering, based on
> the file format I outlined earlier, whether it was possible to simply host
> it as a BED file to view it on the browser. I hoping to get some advice in
> preparing my current file to fit a BED/bigBED type format? I have read the
> FAQ Data File Formats, but I am not sure how to handle my files with the
> multiple coordinates to fit into a UCSC compatible format.
>
> Thanks again.
>
> Cheers,
> Rathi
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome