Hello helpdesk,

I have again a curiosity concerning blat alignments. I consider the case 
BM451627. It was for some reasons not in the dataset of hg17, so I 
downloaded the sequence and ran a blat  -stepSize=5 -minScore=0 
-minIdentity=0, which should correspond to the settings used at UCSC. 
Checking idendity, I find the highest score of about 20% sequence 
identity -- match/query length -- it also does not change much when 
additionally taking into account repmatch.

However, I found BM451627 in the UCSC hg16 database, where it reports a 
~98% identity match. Looking closer at the alignment, in hg16 there is a 
~150nt stretch from the 1244nt which aligns with 98% identity --- and 
probably a couple of bases that changed from hg16->hg17 are responsible 
that in this 150nt region sequence identity drops below the threshold of 
96%.

My question now is:

Does this hold for all identities, say a transcript aligns with 1000 nt 
and 98% identity in one place and in another place with 100nt at 98% 
will be put in both places, regardless of the coverage of the transcript 
by the alignment? In other words, the identity criterion of 96% or 0.5% 
of the best alignment is applied to match/(Qend-Qstart)? And if so, what 
was the motivation to not take the "global identity" of the query, did 
you have bad experiences with transcripts that did not want to align 
that way?

Thank you!

micha.

_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to