Hi Galt, I'm sorry; that was my typo (I'm a com sci student entering the world of bioinformatics). Thank you for the info. I had been looking at some of those links and I had initially decided to use BLAT to generate a psl file, parse it, and write a script to convert to SAM format. I was just wondering if there were already tools to ease my job. Since these are traditional Sanger-style reads, most of the next-gen aligners are not sufficient for my purposes. I hadn't heard of bigBed and bigWig before but I'll check them out. The ultimate goal is to get a format that is accepted to GBrowse tracks and one in specific used SAM. Thank you for the info!
Bremen Braun On Fri, Jan 15, 2010 at 1:16 PM, Galt Barber <[email protected]> wrote: > Hi, Bremen! > > Could you explain what "backend sequences" are? > > It seems likely that BLAT could do the job. > It is good for many kinds of alignment jobs. > > However, for very short reads, we recommend > using a short read aligner like MAQ, etc. > > I found this on samtools site to-do list: > ------------- > Converting the PSL format to SAM > * Priority: Low > * Difficulty: Easy > * Background: PSL is widely used by UCSC. Samtools provides a simple > converter, but it only translates coordinates. > * Description: Implement a proper converter for PSL. A perl/python > script would be ideal. > * Note: It would be better for someone to maintain the converters for > other formats. It is hard for one person to keep track of the development of > all the aligners. > ----------- > > Background on SAM/BAM > http://bioinformatics.oxfordjournals.org/cgi/reprint/btp352v1.pdf > > BAM is just binary compressed and possibly indexed SAM. > > SAM is defined as these required fields: > # Name Description > 1 QNAME Query NAME of the read or the read pair > 2 FLAG bitwise FLAG (pairing, strand, mate strand, etc.) > 3 RNAME Reference sequence NAME > 4 POS 1-based leftmost POSition of clipped alignment > 5 MAPQ MAPping Quality (Phred-scaled) > 6 CIGAR extended CIGAR string (operations: MIDNSHP) > 7 MRNM Mate Reference NaMe (‘=’ if same as RNAME) > 8 MPOS 1-based leftmost Mate POSition > 9 ISIZE inferred Insert SIZE > 10 SEQ query SEQuence on the same strand as the reference > 11 QUAL query QUALity (ASCII-33=Phred base quality) > > They say that they already have a converter. > Perhaps it is good enough. > > To convert psl to SAM, it might be a little easier > if you output pslx since it will include the sequence > itself. > > Just off-hand, I'd say the pslx format could supply > info for fields 1, 4, and 10 of the SAM format. > Some of the other fields allow * to stand in for unknown values. > > What are hoping to do with the samtools? > > The bigBed and bigWig tools from UCSC provide some > overlap in functionality with SAM/BAM. > > PSL FORMAT INFO > ---------------- > http://genome.ucsc.edu/FAQ/FAQformat#format2 > > PSL lines represent alignments, and are typically taken from files > generated by BLAT or psLayout. See the BLAT documentation for more details. > All of the following fields are required on each data line within a PSL > file: > > 1. matches - Number of bases that match that aren't repeats > 2. misMatches - Number of bases that don't match > 3. repMatches - Number of bases that match but are part of repeats > 4. nCount - Number of 'N' bases > 5. qNumInsert - Number of inserts in query > 6. qBaseInsert - Number of bases inserted in query > 7. tNumInsert - Number of inserts in target > 8. tBaseInsert - Number of bases inserted in target > 9. strand - '+' or '-' for query strand. For translated alignments, > second '+'or '-' is for genomic strand > 10. qName - Query sequence name > 11. qSize - Query sequence size > 12. qStart - Alignment start position in query > 13. qEnd - Alignment end position in query > 14. tName - Target sequence name > 15. tSize - Target sequence size > 16. tStart - Alignment start position in target > 17. tEnd - Alignment end position in target > 18. blockCount - Number of blocks in the alignment (a block contains no > gaps) > 19. blockSizes - Comma-separated list of sizes of each block > 20. qStarts - Comma-separated list of starting positions of each block in > query > 21. tStarts - Comma-separated list of starting positions of each block in > target > > -Galt > > Hello, >> I am wondering if Blat is recommended for my situation. I am hoping to >> display backend sequences mapped against genomic sequences and to view the >> differences using SAM. What comparison tool would you recommend? I see as >> of >> yet samtools doesn't convert psl to sam but there is a low priority open >> task to do so. The task difficulty has been described as easy, so surely >> there's someone that has already performed this task? >> >> Thanks, >> Bremen Braun >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
