Hi, Peng!
As the FAQ points out
http://genome.ucsc.edu/FAQ/FAQblat.html
"A note on filtering output: increasing the -minScore parameter value
beyond one-half of the query size has no further effect. Therefore, use
either the pslReps or pslCDnaFilter program available in the Genome
Browser source code to filter for the size, score, coverage, or quality
desired. For information on obtaining the source code, see our FAQ on
source code licensing and downloads. "
This seems to have been an odd restriction
which was removed at the urging of users,
however, the change came only in 2008:
blat/version.doc
1.72 (galt 09-Dec-08): (in blat version 34x3)
Fixed -minScore, filter was not working when over half query-size.
v197_branch: 1.72.0.2
revision 1.72
date: 2008/12/09 08:11:46; author: galt; state: Exp; lines: +1 -0
fixing minScore
----------------------------
galt
Tue Dec 9 08:11:46 2008 +0000
fixing minScore
diff --git src/jkOwnLib/gfBlatLib.c src/jkOwnLib/gfBlatLib.c
--- src/jkOwnLib/gfBlatLib.c
+++ src/jkOwnLib/gfBlatLib.c
@@ -18,7 +18,7 @@
static void saveAlignments(char *chromName, int chromSize, int chromOffset,
struct ssBundle *bun, struct hash *t3Hash,
boolean qIsRc, boolean tIsRc,
enum ffStringency stringency, int minMatch, struct gfOutput *out)
/* Save significant alignments to file in .psl format. */
{
struct dnaSeq *tSeq = bun->genoSeq, *qSeq = bun->qSeq;
struct ssFfItem *ffi;
-if (minMatch > qSeq->size/2) minMatch = qSeq->size/2;
-if (minMatch < 1) minMatch = 1;
for (ffi = bun->ffList; ffi != NULL; ffi = ffi->next)
{
struct ffAli *ff = ffi->ff;
struct trans3 *t3List = NULL;
int score;
if (t3Hash != NULL)
t3List = hashMustFindVal(t3Hash, tSeq->name);
score = scoreAli(ff, bun->isProt, stringency, tSeq, t3List);
if (score >= minMatch)
{
out->out(chromName, chromSize, chromOffset, ff, tSeq, t3Hash, qSeq,
qIsRc, tIsRc, stringency, minMatch, out);
}
}
}
See the two lines leading with "-" ?
They were deleted. They seemed to be
unneeded and causing unexpected behavior
to users.
Unfortunately, Jim Kent's official release
seems to date back to 2007, but you could
get the source and compile it.
Any blat version after 34x3 should have the fix.
With the newer version, the cutoff works more
as you would expect. And for your example
of a 25bp stretch of dna with one mismatch,
your score would be +24 for the matches and
-1 for the 1 mismatch, thus score=24-1==23.
And thus if you use minScore of 23 or lower
you can see the output psl record.
-minScore=23
As we mentioned before,
you can just set minScore to zero and
then filter the psl output
with other tools afterwards.
-Galt
Ar 4/27/2010 3:35 PM, scríobh Peng Yu:
> Hi Galt,
>
> Here is the command that I use. You mentioned "Generally people don't
> much bother with using BLAT's own commandline options for minScore,
> etc." But I want to understand what minScore is and when it can be
> ignored. Would you please let me know?
>
>
> $ blat -t=dna -q=dna -stepSize=5 -minScore=25 -maxGap=0 -noHead \
> database.fasta \
> query.fasta \
> query.psl
> $ cat query.fasta
>> test_sequence
> cttgcaccggaaagtctgctccaga
> $ cat database.fasta
>> database_chr1
> ctagcaccggaaagtctgctccaga
> $ cat query.psl
> 24 1 0 0 0 0 0 0 +
> test_sequence 25 0 25 database_chr1 25 0 25
> 1 25, 0, 0,
>
>
>
> On Mon, Apr 26, 2010 at 4:30 PM, Jennifer Jackson<[email protected]> wrote:
>> Hello Peng,
>>
>> Very sorry, your reply went to the genome mailing list only, not to your
>> email address as well. Our apologies.
>>
>> Here is the posting:
>> https://lists.soe.ucsc.edu/pipermail/genome/2010-April/022012.html
>>
>> Jennifer
>>
>> ---------------------------------
>> Jennifer Jackson
>> UCSC Genome Informatics Group
>> http://genome.ucsc.edu/
>>
>> On 4/24/10 12:09 PM, Peng Yu wrote:
>>>
>>> Could somebody answer me the following question?
>>>
>>> On Wed, Apr 21, 2010 at 2:48 PM, Peng Yu<[email protected]> wrote:
>>>>
>>>> I'm wondering what "some sort of gap penalty" refers to. Also I query
>>>> 25bp sequence using the default, BLAT still gives the result. By
>>>> definition 25bp sequence should at most have a score of 25, which is
>>>> less than 30. Why the query still returns the the result?
>>>>
>>>> -minScore=N sets minimum score. This is the matches minus the
>>>> mismatches minus some sort of gap penalty. Default is 30
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Peng
>>>>
>>>
>>>
>>>
>>
>
>
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome