I have reproduced your results, and your blat and canFam2.2bit are working correctly.
hgBlat uses gfServer, so using gfClient against gfServer will produce results that are more similar to hgBlat results, e.g. with 106 as the top hit for in.fa. In the case of your test sequence, it is fairly long at 27k bases. (In fact gfServer will only align the first 40k of any huge query received.) gfServer has other differences from blat as well. In particular, it defaults to only returning the 100 most promising-looking alignments per strand. The reason you see 116 for + and 123 for - instead of 100 and 100 is because some later processing step in blat chose to split some of alignments into multiple shorter alignments. Also, considering that your huge 27k doesn't produce any alignments with scores very high means only small pieces of it are aligning in many places around the genome. Because hgBlat is meant for interactive use where people can look at the results and make their own judgments, the threshold is very low. Something like just having a score of 20. This is why we mention setting minIdentity and minScore so low for blat or gfClient. But that was ONLY if you wanted to reproduce hgBlat output, something that should not even be considered a goal for most work. What people do is post-filter their psl results with tools like pslReps and pslCDnaFilter which have much more flexible filtering than just minScore and minIdentity. http://genome.cse.ucsc.edu/FAQ/FAQblat.html -Galt On 04/06/11 02:35, Ilinca Tudose wrote: > Dear UCSC Genome Browser team, > > We have problems to recover your online blat search results with our local > blat searches. Our problem is that we get a lot more hits than the online > search reports but we can not figure out why they are not reported. > > We are using the stand alone blat version with the same parameters as > suggested on the FAQ page and I compute the score like you do in your script > (psl.c). We are blatting DNA and use the same genome version. > > blat -stepSize=5 -repMatch=1024 -minScore=0 -minIdentity=0 canFam2.2bit > in.fa temp.psl > (pslWithScores.psl has the score and a percent identity in its first two > columns and in temp.psl you will find the output file from the local blat > search). > > When we blat the sequence attached (in.fa) we get the best score 597, while > the web-tool's best hit has the score 106. As you can see, we also get this > hit (so the computation of the score is correct), but we also get a lot more > (and not few better ones). Is it something we need to change about the > parameters or do we need to filter the results afterwards? > > Here is an example: > > 138 22 0 0 5 349 5 228 + Emx2os_Human 7282 > 3898 4407 chr24 50763139 31027810 31028198 7 > 66,27,8,10,7,8,34, 3898,3977,4014,4038,4048,4358,4373, > 31027810,31027890,31027925,31028034,31028107,31028156,31028164, > This is your best match (according to the online tool) as is in the psl > output file, which scores 106. > > 774 121 0 0 27 3588 29 60189 - Emx2os_Human > 7282 69 4552 chr28 44191819 30945073 31006157 33 > 8,4,17,8,32,29,51,12,22,95,17,15,31,8,38,6,6,36,38,6,16,39,10,7,24,21,22,67,10,22,144,28,6, > 2730,2744,2749,2766,2780,2889,3578,3683,3703,3730,3950,3977,3999,4060,4068,4107,4113,4537,4729,4774,4781,6119,6164,6232,6246,6352,6373,6788,6943,6966,6992,7179,7207, > 30945073,30945094,30945098,30945116,30945129,30945237,30945598,30945689,30945719,30945742,30945974,30946002,30946023,30946079,30946098,30946136,30946143,30946252,30946463,30946509,30946520,30946607,30946651,30946708,30946719,30946823,30946845,31005736,31005884,31005908,31005930,31006116,31006151, > This is the highest scoring hit we got. The score is 597, but between this > one and 106 there are many better scores. Why do you not report this hits? > > Kind regards, > ilinca tudose > > > ------------------------------------------------------------------------ > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
