Hello Peng,

These sequences are short. In order to get a minScore of 25, very short 
windows of matching sequence must be squeezed together along the query.

You are at the very lower edge of sequence length for BLAT to function. 
As noted in the BLAT doc, at this sequence length legitimate matches may 
even be missed. That is just how the program is designed - not optimized 
for very short sequences.

If you have a lot of short sequences (next gen?), then using one of the 
aligners specifically designed for that type of query is probably a 
better, not to mention quicker, option.

A quick google can bring up several alignment tool options for short 
sequences. Which to use will be your decision and may require some 
experimentation to find the best for your project.

Thanks,
Jennifer


---------------------------------
Jennifer Jackson
UCSC Genome Informatics Group
http://genome.ucsc.edu/

On 4/25/10 2:20 PM, Peng Yu wrote:
> Hi,
>
> I have the following two sequences. The query has one nucleotide
> missing at position 13 compared with the database.
> $ cat query.fasta
>> test_sequence
> cttgcaccggaatgtctgctccaga
> $ cat database.fasta
>> database_chr1
> cttgcaccggaaagtctgctccaga
>
>
> Then I run blast with the following command.
>
> blat -t=dna -q=dna -stepSize=2 -minScore=25 -maxGap=1 -out=pslx
> database.fasta query.fasta query2.pslx
> blat -t=dna -q=dna -stepSize=3 -minScore=25 -maxGap=1 -out=pslx
> database.fasta query.fasta query3.pslx
>
> The resulted files are the following. I understand that stepSize is
> the offset between the K-mers in the database. But I still don't
> understand why stepSize has to be less than or equal to 2 to detect
> this query in the database. Could you help me understand it?
>
> $ cat query2.pslx
> psLayout version 3
>
> match mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q       
>         Q       Q
>       Q       T               T       T       T       block   blockSizes      
> qStarts  tStarts
>               match   match           count   bases   count   bases           
> name
>       size    start   end     name            size    start   end     count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
> 24    1       0       0       0       0       0       0       +       
> test_sequence   25      0       25      database_chr1   25      0       25    
>   1       25,     0,      0,      cttgcaccggaatgtctgctccaga,      
> cttgcaccggaaagtctgctccaga,
> $ cat query3.pslx
> psLayout version 3
>
> match mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q       
>         Q       Q
>       Q       T               T       T       T       block   blockSizes      
> qStarts  tStarts
>               match   match           count   bases   count   bases           
> name
>       size    start   end     name            size    start   end     count
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to