Hello helpdesk, I have again a curiosity concerning blat alignments. I consider the case BM451627. It was for some reasons not in the dataset of hg17, so I downloaded the sequence and ran a blat -stepSize=5 -minScore=0 -minIdentity=0, which should correspond to the settings used at UCSC. Checking idendity, I find the highest score of about 20% sequence identity -- match/query length -- it also does not change much when additionally taking into account repmatch.
However, I found BM451627 in the UCSC hg16 database, where it reports a ~98% identity match. Looking closer at the alignment, in hg16 there is a ~150nt stretch from the 1244nt which aligns with 98% identity --- and probably a couple of bases that changed from hg16->hg17 are responsible that in this 150nt region sequence identity drops below the threshold of 96%. My question now is: Does this hold for all identities, say a transcript aligns with 1000 nt and 98% identity in one place and in another place with 100nt at 98% will be put in both places, regardless of the coverage of the transcript by the alignment? In other words, the identity criterion of 96% or 0.5% of the best alignment is applied to match/(Qend-Qstart)? And if so, what was the motivation to not take the "global identity" of the query, did you have bad experiences with transcripts that did not want to align that way? Thank you! micha. _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
