Hello Peng, Jim Kent, the BLAT program author, has control over all aspects of the program design, but we thank you again for your input!
Take care, Jennifer --------------------------------- Jennifer Jackson UCSC Genome Informatics Group http://genome.ucsc.edu/ On 4/30/10 5:21 AM, Peng Yu wrote: > On Tue, Apr 27, 2010 at 9:00 PM, Galt Barber<[email protected]> wrote: >> >> Hi, Peng! >> >> As the FAQ points out >> http://genome.ucsc.edu/FAQ/FAQblat.html >> >> "A note on filtering output: increasing the -minScore parameter value beyond >> one-half of the query size has no further effect. Therefore, use either the >> pslReps or pslCDnaFilter program available in the Genome Browser source >> code to filter for the size, score, coverage, or quality desired. For >> information on obtaining the source code, see our FAQ on source code >> licensing and downloads. " >> >> This seems to have been an odd restriction >> which was removed at the urging of users, >> however, the change came only in 2008: >> >> blat/version.doc >> 1.72 (galt 09-Dec-08): (in blat version 34x3) >> Fixed -minScore, filter was not working when over half query-size. >> v197_branch: 1.72.0.2 >> >> revision 1.72 >> date: 2008/12/09 08:11:46; author: galt; state: Exp; lines: +1 -0 >> fixing minScore >> ---------------------------- >> >> galt >> Tue Dec 9 08:11:46 2008 +0000 >> fixing minScore >> diff --git src/jkOwnLib/gfBlatLib.c src/jkOwnLib/gfBlatLib.c >> --- src/jkOwnLib/gfBlatLib.c >> +++ src/jkOwnLib/gfBlatLib.c >> @@ -18,7 +18,7 @@ >> >> >> static void saveAlignments(char *chromName, int chromSize, int chromOffset, >> struct ssBundle *bun, struct hash *t3Hash, >> boolean qIsRc, boolean tIsRc, >> enum ffStringency stringency, int minMatch, struct gfOutput *out) >> /* Save significant alignments to file in .psl format. */ >> { >> struct dnaSeq *tSeq = bun->genoSeq, *qSeq = bun->qSeq; >> struct ssFfItem *ffi; >> -if (minMatch> qSeq->size/2) minMatch = qSeq->size/2; >> -if (minMatch< 1) minMatch = 1; >> for (ffi = bun->ffList; ffi != NULL; ffi = ffi->next) >> { >> struct ffAli *ff = ffi->ff; >> struct trans3 *t3List = NULL; >> int score; >> if (t3Hash != NULL) >> t3List = hashMustFindVal(t3Hash, tSeq->name); >> score = scoreAli(ff, bun->isProt, stringency, tSeq, t3List); >> if (score>= minMatch) >> { >> out->out(chromName, chromSize, chromOffset, ff, tSeq, t3Hash, qSeq, >> qIsRc, tIsRc, stringency, minMatch, out); >> } >> } >> } >> >> See the two lines leading with "-" ? >> They were deleted. They seemed to be >> unneeded and causing unexpected behavior >> to users. >> >> Unfortunately, Jim Kent's official release >> seems to date back to 2007, but you could >> get the source and compile it. >> >> Any blat version after 34x3 should have the fix. >> >> With the newer version, the cutoff works more >> as you would expect. And for your example >> of a 25bp stretch of dna with one mismatch, >> your score would be +24 for the matches and >> -1 for the 1 mismatch, thus score=24-1==23. >> >> And thus if you use minScore of 23 or lower >> you can see the output psl record. >> -minScore=23 >> >> As we mentioned before, >> you can just set minScore to zero and >> then filter the psl output >> with other tools afterwards. > > Hi, > > Since setting minScore to zero would probably more common than other > cases. I think that it is make sense to change its default value to 0 > rather than an arbitrary number 30 as it is right now. Do you agree? > >> -Galt >> >> Ar 4/27/2010 3:35 PM, scríobh Peng Yu: >>> >>> Hi Galt, >>> >>> Here is the command that I use. You mentioned "Generally people don't >>> much bother with using BLAT's own commandline options for minScore, >>> etc." But I want to understand what minScore is and when it can be >>> ignored. Would you please let me know? >>> >>> >>> $ blat -t=dna -q=dna -stepSize=5 -minScore=25 -maxGap=0 -noHead \ >>> database.fasta \ >>> query.fasta \ >>> query.psl >>> $ cat query.fasta >>>> >>>> test_sequence >>> >>> cttgcaccggaaagtctgctccaga >>> $ cat database.fasta >>>> >>>> database_chr1 >>> >>> ctagcaccggaaagtctgctccaga >>> $ cat query.psl >>> 24 1 0 0 0 0 0 0 + >>> test_sequence 25 0 25 database_chr1 25 0 25 >>> 1 25, 0, 0, >>> >>> >>> >>> On Mon, Apr 26, 2010 at 4:30 PM, Jennifer Jackson<[email protected]> >>> wrote: >>>> >>>> Hello Peng, >>>> >>>> Very sorry, your reply went to the genome mailing list only, not to your >>>> email address as well. Our apologies. >>>> >>>> Here is the posting: >>>> https://lists.soe.ucsc.edu/pipermail/genome/2010-April/022012.html >>>> >>>> Jennifer >>>> >>>> --------------------------------- >>>> Jennifer Jackson >>>> UCSC Genome Informatics Group >>>> http://genome.ucsc.edu/ >>>> >>>> On 4/24/10 12:09 PM, Peng Yu wrote: >>>>> >>>>> Could somebody answer me the following question? >>>>> >>>>> On Wed, Apr 21, 2010 at 2:48 PM, Peng Yu<[email protected]> wrote: >>>>>> >>>>>> I'm wondering what "some sort of gap penalty" refers to. Also I query >>>>>> 25bp sequence using the default, BLAT still gives the result. By >>>>>> definition 25bp sequence should at most have a score of 25, which is >>>>>> less than 30. Why the query still returns the the result? >>>>>> >>>>>> -minScore=N sets minimum score. This is the matches minus the >>>>>> mismatches minus some sort of gap penalty. Default is 30 >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Peng >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> >> > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
