Hi, Shlomit! I tried your test sequences which consisted of the following:
A 60 bp nucleotide query and a toy 300 bp nucleotide database consisting of 5 exact or near-exact matches to the 60 bp query: 1. exact match to query 2. 4 mismatches in a row in a block near end with final tail exon of 8 bp exact matches 3. 6 mismatches ... 4. 8 mismatches ... 5. another exact match of entire query. The entire "genome" was 60*5 = 300 bp. BLAT was not finding anything except the first exact match. Even with -fastMap, and trying all other settings, it would never find more than the the 2 exact matches (#1 and #5 above). Splitting the above db into 5 separately named sequences did make blat see all 5 alignments, although we shouldn't need or want to do that. Since the default %ID = 90%, for 60 bp that's about 6 bases. But we know that blat will treat a large enough (>=6?) run of mismatches as a gap. Thinking that in a real gnome, the alignments wouldn't usually be so close together, I tried adding some padding by taking a random chunk of dna and inserting it between each of the 5 target parts above so that we have one much larger target sequence. I first tried padding with 3k chunks of random sequence, and that worked! It found all 5, reporting #1 and #5 as perfect matches score 60, repoting #2 as score 56 with 4 mismatches, reporting #3 and #4 as scores 54 and 52 with a single gap and two exons. This is just what we would expect to see. By experimentation, I was able to lower the chunksize of random dna to 200bp and still get all 5 alignments. This is just using BLAT with default settings. When I tried 150bp chunks, alignments for #3 and #4 dropped out of the output. I am not sure why BLAT has a prejudice against reporting several alignments for the same query that are so close together on the target, but in real-world BLAT use, instead of this toy problem, BLAT seems to work ok. -Galt On Thu, 29 Jan 2009, shlomit farkash wrote: > Hello, > > Sorry, I have another question about using blat for probe design. I have > probes with length ranging from 45 to 60, I am trying to find whether > there are other places in the genome with about 10% mismatch (the most) > besides the 100% match to exclude this probe from my list (it is not > considered unique for hybridization purposes). when using 95% min > identity, I get a hit even with a stretch of 8 mismatches (out of 60), > since it is considered as a single gap ? can I take these gaps into > account as well when searching for a hit? > > Thanks a lot, > Shlomit Amar-Farkash > The Hebrew university, Jerusalem > _______________________________________________ > Genome maillist - [email protected] > http://www.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] http://www.soe.ucsc.edu/mailman/listinfo/genome
