Re: [Genome] understanding the effect of stepSize of BLAT

Galt Barber Tue, 27 Apr 2010 19:45:28 -0700

Hi, Peng!

Note: when running your tests, you may wish to set -minScore=0
to avoid any confusion about which version of blat you have
and exactly how the minScore will be handled.

Your example shows off blat in a textbook-perfect fashion.

The 25 positions are 0 to 24.
The mismatch is at position 12. (in the middle)
The default tileSize is 11.
The default minMatch is 2, there must be two exact-matches
on the diagonal within the distance covered by the query.

Step size 2 target database tiles.
0-10
2-12 *
4-14 *
6-16 *
8-18 *
10-20 *
12-22 *
14-24

* indicates that the range contains the position 12 substitution
which will not create a matching hit in the index.
Stepsize 2 allows two tiles to match, one at the beginning (pos. 0),
and one at the end (position 14).

Step size 3 target database tiles.
0-10
3-13 *
6-16 *
9-19 *
12-22 *

Only one tile is available, there are no high-scoring pairs of matches 
nearby.  Thus no alignment is found.

The query tiles are going to advance by stepSize 1,
and then the same thing will happen with the reverse
complement of the query.

Note that if you are truly working with very short
sequences you could also consider reducing the
tileSize.  You could also reduce minMatch to 1.
However both of these changes would decrease
the performance on large target databases.
There is a sensitivity/speed trade-off with
alignment search tools in general.

This is the basic picture for nucleotide target and query blat.

-Galt

>>> On 4/25/10 2:20 PM, Peng Yu wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have the following two sequences. The query has one nucleotide
>>>> missing at position 13 compared with the database.
>>>> $ cat query.fasta
>>>>>
>>>>> test_sequence
>>>>
>>>> cttgcaccggaatgtctgctccaga
>>>> $ cat database.fasta
>>>>>
>>>>> database_chr1
>>>>
>>>> cttgcaccggaaagtctgctccaga
>>>>
>>>>
>>>> Then I run blast with the following command.
>>>>
>>>> blat -t=dna -q=dna -stepSize=2 -minScore=25 -maxGap=1 -out=pslx
>>>> database.fasta query.fasta query2.pslx
>>>> blat -t=dna -q=dna -stepSize=3 -minScore=25 -maxGap=1 -out=pslx
>>>> database.fasta query.fasta query3.pslx
>>>>
>>>> The resulted files are the following. I understand that stepSize is
>>>> the offset between the K-mers in the database. But I still don't
>>>> understand why stepSize has to be less than or equal to 2 to detect
>>>> this query in the database. Could you help me understand it?
>>>>
>>>> $ cat query2.pslx
>>>> psLayout version 3
>>>>
>>>> match mis- rep. N's Q gap Q gap T gap T gap strand Q
>>>> Q Q
>>>> Q T T T T block blockSizes
>>>> qStarts tStarts
>>>> match match count bases count bases
>>>> name
>>>> size start end name size start end
>>>> count
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> 24 1 0 0 0 0 0 0 +
>>>> test_sequence 25 0 25 database_chr1 25 0 25
>>>> 1 25, 0, 0, cttgcaccggaatgtctgctccaga,
>>>> cttgcaccggaaagtctgctccaga,
>>>> $ cat query3.pslx
>>>> psLayout version 3
>>>>
>>>> match mis- rep. N's Q gap Q gap T gap T gap strand Q
>>>> Q Q
>>>> Q T T T T block blockSizes
>>>> qStarts tStarts
>>>> match match count bases count bases
>>>> name
>>>> size start end name size start end
>>>> count
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>
>>
>>
>>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] understanding the effect of stepSize of BLAT

Reply via email to