Hello Peng, These sequences are short. In order to get a minScore of 25, very short windows of matching sequence must be squeezed together along the query.
You are at the very lower edge of sequence length for BLAT to function. As noted in the BLAT doc, at this sequence length legitimate matches may even be missed. That is just how the program is designed - not optimized for very short sequences. If you have a lot of short sequences (next gen?), then using one of the aligners specifically designed for that type of query is probably a better, not to mention quicker, option. A quick google can bring up several alignment tool options for short sequences. Which to use will be your decision and may require some experimentation to find the best for your project. Thanks, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/25/10 2:20 PM, Peng Yu wrote: > Hi, > > I have the following two sequences. The query has one nucleotide > missing at position 13 compared with the database. > $ cat query.fasta >> test_sequence > cttgcaccggaatgtctgctccaga > $ cat database.fasta >> database_chr1 > cttgcaccggaaagtctgctccaga > > > Then I run blast with the following command. > > blat -t=dna -q=dna -stepSize=2 -minScore=25 -maxGap=1 -out=pslx > database.fasta query.fasta query2.pslx > blat -t=dna -q=dna -stepSize=3 -minScore=25 -maxGap=1 -out=pslx > database.fasta query.fasta query3.pslx > > The resulted files are the following. I understand that stepSize is > the offset between the K-mers in the database. But I still don't > understand why stepSize has to be less than or equal to 2 to detect > this query in the database. Could you help me understand it? > > $ cat query2.pslx > psLayout version 3 > > match mis- rep. N's Q gap Q gap T gap T gap strand Q > Q Q > Q T T T T block blockSizes > qStarts tStarts > match match count bases count bases > name > size start end name size start end count > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > 24 1 0 0 0 0 0 0 + > test_sequence 25 0 25 database_chr1 25 0 25 > 1 25, 0, 0, cttgcaccggaatgtctgctccaga, > cttgcaccggaaagtctgctccaga, > $ cat query3.pslx > psLayout version 3 > > match mis- rep. N's Q gap Q gap T gap T gap strand Q > Q Q > Q T T T T block blockSizes > qStarts tStarts > match match count bases count bases > name > size start end name size start end count > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
