Hi Mary, Thanks for your reply. I tried running the soap2sam.pl script however I think because the SOAP output that I was given has been modified from the original SOAP output (i.e. my files). Therefore, I was wondering, based on the file format I outlined earlier, whether it was possible to simply host it as a BED file to view it on the browser. I hoping to get some advice in preparing my current file to fit a BED/bigBED type format? I have read the FAQ Data File Formats, but I am not sure how to handle my files with the multiple coordinates to fit into a UCSC compatible format.
Thanks again. Cheers, Rathi On Thu, 19 May 2011 04:38:46 +1000, Mary Goldman <[email protected]> wrote: > Hi Rathi, > > One of our engineers recommended converting your output to BAM. In > particular, please see the soap2sam.pl script at > http://soap.genomics.org.cn/soapaligner.html > > You will also need samtools to convert the SAM to BAM, and to sort and > build an index for the BAM. > http://samtools.sourceforge.net/ > > Our notes about using BAM are here: > http://genome.ucsc.edu/goldenPath/help/bam.html > > I hope this information is helpful. Please feel free to contact the > mail list again if you require further assistance. > > Best, > Mary > ------------------ > Mary Goldman > UCSC Bioinformatics Group > > > On 5/17/11 7:35 AM, Rathi Thiagarajan wrote: >> Hi there, >> >> I was given the following RNASeq paired-end data and looking for ways to >> visualize it on the genome browser. The file is a processed SOAP >> aligned, >> paired-end, Illumina, mapped to hg19. >> >> The file contains the following columns (tab delimited): >> ID seqOne seqTwo chromosome oneStarts oneStops twoStarts twoStops >> >> Looking at this example of a paired-end junction: >> >> HWUSI-EAS474_21_30E9BAAXX:2:1:766:164 is the ID >> GCACAGCAGAAGTGTTTTTCTTTTTTTAATGAACAA is the left end >> GTCCCATGTTGACAATTTGTATGGTTTACTTTTTCA is the right end >> chr12 is the chromosome >> 14954338,14956285, are the starts for the left end, which aligns to a >> junction 14954362,14956297, stops for the left end >> 14954311, right end start >> 14954346, right end stop >> >> Here is a snapshot of a few lines from the actual file: >> >> GA2:1:1:32:1827#0 AAATTAGACAACTGATGTCATGCTGTCTTGGTCTCC >> GTGGAAACAAGTAATGGAACCAACGCCCTGTGTGTA chr11 16779120, 16779155, 16779542, >> 16779577, >> GA2:1:1:34:1274#0 TGGTGACCTTCAAGGAATCTTTGAGGGCCTGGAGCT >> TCCAGGAGCAGCTCCAGGCCCTCAAAGAGTCCTTGA chr11 71726406, 71726441, 71726415, >> 71726450, >> >> (and a junction read) : >> GA2:1:1:105:706#0 TGGCAGTGCAAATATCCAAGAAGAGGAAGTTTGTCG >> CCTGGTTGGTGTAACTCGCACCTCAACTCCAGAGTA chr11 75110593,75111737, >> 75110621,75111745, 75111807, 75111842, >> >> >> Would really appreciate your advice on how to prepare this file to >> visualize it the UCSC GB especially with the multiple coordinates for >> the >> junction tags. I was advised that BED file might work here however not >> sure how to set-up the files based on the instructions that was provided >> in FAQ Data File Formats >> >> Thanking you in advance. >> >> Cheers, >> Rathi _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
