Hi Jennifer, I made up the example to understand how BLAT works internally not for short sequencing mapping. I made the query and the database short deliberately.
I don't think that I understand enough the details of the algorithm after reading BLAT paper so that I can explain why the match is missing for this particular case. Would you please help me understand how BLAT work for this case of stepSize=3 but not 2? On Mon, Apr 26, 2010 at 4:06 PM, Jennifer Jackson <[email protected]> wrote: > Hello Peng, > > These sequences are short. In order to get a minScore of 25, very short > windows of matching sequence must be squeezed together along the query. > > You are at the very lower edge of sequence length for BLAT to function. As > noted in the BLAT doc, at this sequence length legitimate matches may even > be missed. That is just how the program is designed - not optimized for very > short sequences. > > If you have a lot of short sequences (next gen?), then using one of the > aligners specifically designed for that type of query is probably a better, > not to mention quicker, option. > > A quick google can bring up several alignment tool options for short > sequences. Which to use will be your decision and may require some > experimentation to find the best for your project. > > Thanks, > Jennifer > > > --------------------------------- > Jennifer Jackson > UCSC Genome Informatics Group > http://genome.ucsc.edu/ > > On 4/25/10 2:20 PM, Peng Yu wrote: >> >> Hi, >> >> I have the following two sequences. The query has one nucleotide >> missing at position 13 compared with the database. >> $ cat query.fasta >>> >>> test_sequence >> >> cttgcaccggaatgtctgctccaga >> $ cat database.fasta >>> >>> database_chr1 >> >> cttgcaccggaaagtctgctccaga >> >> >> Then I run blast with the following command. >> >> blat -t=dna -q=dna -stepSize=2 -minScore=25 -maxGap=1 -out=pslx >> database.fasta query.fasta query2.pslx >> blat -t=dna -q=dna -stepSize=3 -minScore=25 -maxGap=1 -out=pslx >> database.fasta query.fasta query3.pslx >> >> The resulted files are the following. I understand that stepSize is >> the offset between the K-mers in the database. But I still don't >> understand why stepSize has to be less than or equal to 2 to detect >> this query in the database. Could you help me understand it? >> >> $ cat query2.pslx >> psLayout version 3 >> >> match mis- rep. N's Q gap Q gap T gap T gap strand Q >> Q Q >> Q T T T T block blockSizes >> qStarts tStarts >> match match count bases count bases >> name >> size start end name size start end >> count >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >> 24 1 0 0 0 0 0 0 + >> test_sequence 25 0 25 database_chr1 25 0 25 >> 1 25, 0, 0, cttgcaccggaatgtctgctccaga, >> cttgcaccggaaagtctgctccaga, >> $ cat query3.pslx >> psLayout version 3 >> >> match mis- rep. N's Q gap Q gap T gap T gap strand Q >> Q Q >> Q T T T T block blockSizes >> qStarts tStarts >> match match count bases count bases >> name >> size start end name size start end >> count >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> > -- Regards, Peng _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
