Hi, Shlomit!

I tried your test sequences which consisted of the
following:

A 60 bp nucleotide query and

a toy 300 bp nucleotide database consisting of
5 exact or near-exact matches to the 60 bp query:

1. exact match to query
2. 4 mismatches in a row in a block
  near end with final tail exon of 8 bp exact matches
3. 6 mismatches ...
4. 8 mismatches ...
5. another exact match of entire query.

The entire "genome" was 60*5 = 300 bp.

BLAT was not finding anything except the
first exact match.  Even with -fastMap,
and trying all other settings, it would
never find more than the the 2 exact matches
(#1 and #5 above).

Splitting the above db into 5 separately named
sequences did make blat see all 5 alignments,
although we shouldn't need or want to do that.

Since the default %ID = 90%, for 60 bp
that's about 6 bases.  But we know that
blat will treat a large enough (>=6?) run of
mismatches as a gap.

Thinking that in a real gnome, the alignments
wouldn't usually be so close together,
I tried adding some padding by taking a random
chunk of dna and inserting it between each of
the 5 target parts above so that we have one much larger
target sequence. I first tried padding with 3k chunks
of random sequence, and that worked!

It found all 5, reporting #1 and #5 as perfect
matches score 60, repoting #2 as score 56 with
4 mismatches, reporting #3 and #4 as scores
54 and 52 with a single gap and two exons.
This is just what we would expect to see.

By experimentation, I was able to lower
the chunksize of random dna to 200bp
and still get all 5 alignments.
This is just using BLAT with default settings.

When I tried 150bp chunks, alignments for
#3 and #4 dropped out of the output.

I am not sure why BLAT has a prejudice against
reporting several alignments for the same query that
are so close together on the target, but in real-world BLAT use,
instead of this toy problem,
BLAT seems to work ok.

-Galt


On Thu, 29 Jan 2009, shlomit farkash wrote:

> Hello,
>
> Sorry, I have another question about using blat for probe design. I have 
> probes with length ranging from 45 to 60, I am trying to find whether 
> there are other places in the genome with about 10% mismatch (the most) 
> besides the 100% match to exclude this probe from my list (it is not 
> considered unique for hybridization purposes). when using 95% min 
> identity, I get a hit even with a stretch of 8 mismatches (out of 60), 
> since it is considered as a single gap ? can I take these gaps into 
> account as well when searching for a hit?
>
> Thanks a lot,
> Shlomit Amar-Farkash
> The Hebrew university, Jerusalem
> _______________________________________________
> Genome maillist  -  [email protected]
> http://www.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist  -  [email protected]
http://www.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to