Re: [Genome] understanding the effect of stepSize of BLAT

Peng Yu Tue, 27 Apr 2010 19:47:51 -0700

Hi Jennifer,

I made up the example to understand how BLAT works internally not for
short sequencing mapping. I made the query and the database short
deliberately.


I don't think that I understand enough the details of the algorithm
after reading BLAT paper so that I can explain why the match is
missing for this particular case. Would you please help me understand
how BLAT work for this case of stepSize=3 but not 2?

On Mon, Apr 26, 2010 at 4:06 PM, Jennifer Jackson <[email protected]> wrote:
> Hello Peng,
>
> These sequences are short. In order to get a minScore of 25, very short
> windows of matching sequence must be squeezed together along the query.
>
> You are at the very lower edge of sequence length for BLAT to function. As
> noted in the BLAT doc, at this sequence length legitimate matches may even
> be missed. That is just how the program is designed - not optimized for very
> short sequences.
>
> If you have a lot of short sequences (next gen?), then using one of the
> aligners specifically designed for that type of query is probably a better,
> not to mention quicker, option.
>
> A quick google can bring up several alignment tool options for short
> sequences. Which to use will be your decision and may require some
> experimentation to find the best for your project.
>
> Thanks,
> Jennifer
>
>
> ---------------------------------
> Jennifer Jackson
> UCSC Genome Informatics Group
> http://genome.ucsc.edu/
>
> On 4/25/10 2:20 PM, Peng Yu wrote:
>>
>> Hi,
>>
>> I have the following two sequences. The query has one nucleotide
>> missing at position 13 compared with the database.
>> $ cat query.fasta
>>>
>>> test_sequence
>>
>> cttgcaccggaatgtctgctccaga
>> $ cat database.fasta
>>>
>>> database_chr1
>>
>> cttgcaccggaaagtctgctccaga
>>
>>
>> Then I run blast with the following command.
>>
>> blat -t=dna -q=dna -stepSize=2 -minScore=25 -maxGap=1 -out=pslx
>> database.fasta query.fasta query2.pslx
>> blat -t=dna -q=dna -stepSize=3 -minScore=25 -maxGap=1 -out=pslx
>> database.fasta query.fasta query3.pslx
>>
>> The resulted files are the following. I understand that stepSize is
>> the offset between the K-mers in the database. But I still don't
>> understand why stepSize has to be less than or equal to 2 to detect
>> this query in the database. Could you help me understand it?
>>
>> $ cat query2.pslx
>> psLayout version 3
>>
>> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q
>>               Q       Q
>>        Q       T               T       T       T       block   blockSizes
>>      qStarts  tStarts
>>        match   match           count   bases   count   bases
>> name
>>        size    start   end     name            size    start   end
>> count
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>> 24      1       0       0       0       0       0       0       +
>> test_sequence   25      0       25      database_chr1   25      0       25
>>    1       25,     0,      0,      cttgcaccggaatgtctgctccaga,
>>  cttgcaccggaaagtctgctccaga,
>> $ cat query3.pslx
>> psLayout version 3
>>
>> match   mis-    rep.    N's     Q gap   Q gap   T gap   T gap   strand  Q
>>               Q       Q
>>        Q       T               T       T       T       block   blockSizes
>>      qStarts  tStarts
>>        match   match           count   bases   count   bases
>> name
>>        size    start   end     name            size    start   end
>> count
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>



-- 
Regards,
Peng

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] understanding the effect of stepSize of BLAT

Reply via email to