On Tue, Apr 27, 2010 at 9:00 PM, Galt Barber <[email protected]> wrote:
>
> Hi, Peng!
>
> As the FAQ points out
>  http://genome.ucsc.edu/FAQ/FAQblat.html
>
> "A note on filtering output: increasing the -minScore parameter value beyond
> one-half of the query size has no further effect. Therefore, use either the
> pslReps or pslCDnaFilter  program available in the Genome Browser source
> code to filter for the size, score, coverage, or quality desired. For
> information on obtaining the source code, see our FAQ on source code
> licensing and downloads. "
>
> This seems to have been an odd restriction
> which was removed at the urging of users,
> however, the change came only in 2008:
>
> blat/version.doc
> 1.72 (galt 09-Dec-08): (in blat version 34x3)
> Fixed -minScore, filter was not working when over half query-size.
>    v197_branch: 1.72.0.2
>
> revision 1.72
> date: 2008/12/09 08:11:46;  author: galt;  state: Exp;  lines: +1 -0
> fixing minScore
> ----------------------------
>
> galt
>  Tue Dec 9 08:11:46 2008 +0000
> fixing minScore
> diff --git src/jkOwnLib/gfBlatLib.c src/jkOwnLib/gfBlatLib.c
> --- src/jkOwnLib/gfBlatLib.c
> +++ src/jkOwnLib/gfBlatLib.c
> @@ -18,7 +18,7 @@
>
>
> static void saveAlignments(char *chromName, int chromSize, int chromOffset,
>        struct ssBundle *bun, struct hash *t3Hash,
>        boolean qIsRc, boolean tIsRc,
>        enum ffStringency stringency, int minMatch, struct gfOutput *out)
>  /* Save significant alignments to file in .psl format. */
>  {
>  struct dnaSeq *tSeq = bun->genoSeq, *qSeq = bun->qSeq;
>  struct ssFfItem *ffi;
> -if (minMatch > qSeq->size/2) minMatch = qSeq->size/2;
> -if (minMatch < 1) minMatch = 1;
>  for (ffi = bun->ffList; ffi != NULL; ffi = ffi->next)
>     {
>     struct ffAli *ff = ffi->ff;
>     struct trans3 *t3List = NULL;
>     int score;
>     if (t3Hash != NULL)
>        t3List = hashMustFindVal(t3Hash, tSeq->name);
>     score = scoreAli(ff, bun->isProt, stringency, tSeq, t3List);
>     if (score >= minMatch)
>        {
>        out->out(chromName, chromSize, chromOffset, ff, tSeq, t3Hash, qSeq,
>            qIsRc, tIsRc, stringency, minMatch, out);
>        }
>     }
>  }
>
> See the two lines leading with "-" ?
> They were deleted.  They seemed to be
> unneeded and causing unexpected behavior
> to users.
>
> Unfortunately, Jim Kent's official release
> seems to date back to 2007, but you could
> get the source and compile it.
>
> Any blat version after 34x3 should have the fix.
>
> With the newer version, the cutoff works more
> as you would expect.  And for your example
> of a 25bp stretch of dna with one mismatch,
> your score would be +24 for the matches and
> -1 for the 1 mismatch, thus score=24-1==23.
>
> And thus if you use minScore of 23 or lower
> you can see the output psl record.
>  -minScore=23
>
> As we mentioned before,
> you can just set minScore to zero and
> then filter the psl output
> with other tools afterwards.

Hi,

Since setting minScore to zero would probably more common than other
cases. I think that it is make sense to change its default value to 0
rather than an arbitrary number 30 as it is right now. Do you agree?

> -Galt
>
> Ar 4/27/2010 3:35 PM, scríobh Peng Yu:
>>
>> Hi Galt,
>>
>> Here is the command that I use. You mentioned "Generally people don't
>> much bother with using BLAT's own commandline options for minScore,
>> etc." But I want to understand what minScore is and when it can be
>> ignored. Would you please let me know?
>>
>>
>> $ blat -t=dna -q=dna -stepSize=5 -minScore=25 -maxGap=0 -noHead \
>>                database.fasta \
>>                query.fasta \
>>                query.psl
>> $ cat query.fasta
>>>
>>> test_sequence
>>
>> cttgcaccggaaagtctgctccaga
>> $ cat database.fasta
>>>
>>> database_chr1
>>
>> ctagcaccggaaagtctgctccaga
>> $ cat query.psl
>> 24      1       0       0       0       0       0       0       +
>> test_sequence   25      0       25      database_chr1   25      0       25
>>    1       25,     0,      0,
>>
>>
>>
>> On Mon, Apr 26, 2010 at 4:30 PM, Jennifer Jackson<[email protected]>
>>  wrote:
>>>
>>> Hello Peng,
>>>
>>> Very sorry, your reply went to the genome mailing list only, not to your
>>> email address as well. Our apologies.
>>>
>>> Here is the posting:
>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-April/022012.html
>>>
>>> Jennifer
>>>
>>> ---------------------------------
>>> Jennifer Jackson
>>> UCSC Genome Informatics Group
>>> http://genome.ucsc.edu/
>>>
>>> On 4/24/10 12:09 PM, Peng Yu wrote:
>>>>
>>>> Could somebody answer me the following question?
>>>>
>>>> On Wed, Apr 21, 2010 at 2:48 PM, Peng Yu<[email protected]>    wrote:
>>>>>
>>>>> I'm wondering what "some sort of gap penalty" refers to. Also I query
>>>>> 25bp sequence using the default, BLAT still gives the result. By
>>>>> definition 25bp sequence should at most have a score of 25, which is
>>>>> less than 30. Why the query still returns the the result?
>>>>>
>>>>>   -minScore=N sets minimum score.  This is the matches minus the
>>>>>               mismatches minus some sort of gap penalty.  Default is 30
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Peng
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>



-- 
Regards,
Peng

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to