Re: [Genome] formatting of PSL line for query with alignments to plus and minus strands

Jennifer Jackson Thu, 21 May 2009 14:58:05 -0700

Hello Phil,

This is a good question, the help section does not include a PSL output 
example for this data type (mixed strand). The example represents a 
technical clarification of the algorithm/file format. For practical 
purposes, "example 7" for PSL format has individual blocks that are too 
short to align (min block length must be around 21 exact bases).

To generate an example, I selected a 200 base region of forward (+) 
strand genomic, created a 200 base region of compliment genomic, and 
manually mixed blocks in two files (each block is 50 bases, +-+-, -+-+). 
I avoided native genomic duplications and known repetitive regions. Then 
I ran a BLAT for all four versions, using the web tool with database 
hg18 (Human, May 2006), and saved output in PSL format. I did not 
include the results (to wide for a good view), however I included the 
query sequences below so that you could run the query and view the 
output yourself.

What I noticed is that when such a sequence is run through BLAT, the 
leading block's strand sets the strand for the PSL output and data from 
the opposite strand is essentially ignored. Keep in mind that if an 
actual transcript was this type of data (mixed strand) it would indicate 
a problem or error, either assembly based (ex: consensus sequence) or 
library construction based (ex: chimeric EST). It would be possible to 
search for alignments in the datasets with gaps, then analyze the 
unaligned gap sequence for mixed strands if you were interested and have 
programming/compute resources.

My test cases contain transcriptome sequence from a characterized gene. 
You could generate a test with longer blocks of mixed data (genomic) to 
possibly create a mixed alignment, if this is something you wish to 
pursue further. For genomic-to-genomic comparisons, our scientists use 
Blastz (see Comparative "speciesX Chain" tracks or Variations "Self 
Chain" for methods).

example gene: Human Gene CHODL (uc002ykv.1) NM_024944  
genomic sequence: (chr21:18,539,021-18,561,558)      
example region: first 200 bases of gene's mRNA

 >forward_strand_genomic hg18_uc002ykv.1_5prime_200bases
gctgctgctgtgatccaggaccagggcgcaccggctcagcctctcacttg
tcagaggccggggaagagaagcaaagcgcaacggtgtggtccaagccggg
gcttctgcttcgcctctaggacatacacgggaccccctaacttcagtccc
ccaaacgcgcaccctcgaagtcttgaactccagccccgcacatccacgcg

 >compliment_strand_genomic hg18_uc002ykv.1_5prime_200bases
cgacgacgacactaggtcctggtcccgcgtggccgagtcggagagtgaac
agtctccggccccttctcttcgtttcgcgttgccacaccaggttcggccc
cgaagacgaagcggagatcctgtatgtgccctgggggattgaagtcaggg
ggtttgcgcgtgggagcttcagaacttgaggtcggggcgtgtaggtgcgc

 >mixed_strand_genomic1 +-+-hg18_uc002ykv.1_5prime_200bases
gctgctgctgtgatccaggaccagggcgcaccggctcagcctctcacttg
agtctccggccccttctcttcgtttcgcgttgccacaccaggttcggccc
gcttctgcttcgcctctaggacatacacgggaccccctaacttcagtccc
ggtttgcgcgtgggagcttcagaacttgaggtcggggcgtgtaggtgcgc

 >mixed_strand_genomic2 -+-+hg18_uc002ykv.1_5prime_200bases
cgacgacgacactaggtcctggtcccgcgtggccgagtcggagagtgaac
tcagaggccggggaagagaagcaaagcgcaacggtgtggtccaagccggg
cgaagacgaagcggagatcctgtatgtgccctgggggattgaagtcaggg
ccaaacgcgcaccctcgaagtcttgaactccagccccgcacatccacgcg

Thanks,
Jennifer Jackson
UCSC Genome Bioinformatics Groups

Dagosto, Phil wrote:
> This issue is not related to any particular assembly.
>
>  
>
> Can you clarify for me how a PSL line for a query sequence that has
> non-contiguous alignments to both the plus and minus strands of the
> target. I am referring, specifically, to Example 7 in your on-line
> description of how to prepare custom annotation tracks.
>
>  
>
> I have read the information and get the part about how the align block
> starts and sizes for the alignments to the minus strand should be
> reversed. What's not clear to me is what the final PSL line for this
> alignment would look like.
>
>  
>
> First, would the data for the alignments to the plus and minus strands
> be combined into a single line of reported on separate lines? If the
> data are to be combined, then:
>
>  
>
> 1.       What would be reported as the strand?
>
> 2.       How would the align block data be combined to report the starts
> and sizes?
>
>  
>
> It would be a real help if you could show me what the line (or lines)
> for Example 7 (or an equivalent example) would look like.
>
>  
>
> Thanks very much for your help.
>
>  
>
> Regards,
>
> Phil
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>   
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] formatting of PSL line for query with alignments to plus and minus strands

Reply via email to