I have reproduced your results, and your blat and canFam2.2bit are 
working correctly.

hgBlat uses gfServer, so using gfClient against gfServer
will produce results that are more similar to hgBlat results,
e.g. with 106 as the top hit for in.fa.

In the case of your test sequence, it is fairly long at 27k bases.
(In fact gfServer will only align the first 40k of any huge query received.)

gfServer has other differences from blat as well.
In particular, it defaults
to only returning the 100 most promising-looking alignments per strand.
The reason you see 116 for + and 123 for - instead of 100 and 100
is because some later processing step in blat chose to split some
of alignments into multiple shorter alignments.

Also, considering that your huge 27k doesn't produce any alignments
with scores very high means only small pieces of it are aligning
in many places around the genome.

Because hgBlat is meant for interactive use where people can
look at the results and make their own judgments, the threshold
is very low.  Something like just having a score of 20.
This is why we mention setting minIdentity and minScore so low
for blat or gfClient.  But that was ONLY if you wanted to
reproduce hgBlat output, something that should not even be
considered a goal for most work.

What people do is post-filter their psl results with tools like
pslReps and pslCDnaFilter which have much more flexible
filtering than just minScore and minIdentity.

http://genome.cse.ucsc.edu/FAQ/FAQblat.html

-Galt

On 04/06/11 02:35, Ilinca Tudose wrote:
> Dear UCSC Genome Browser team,
> 
> We have problems to recover your online blat search results with our local
> blat searches. Our problem is that we get a lot more hits than the online
> search reports but we can not figure out why they are not reported.
> 
> We are using the stand alone blat version with the same parameters as
> suggested on the FAQ page and I compute the score like you do in your script
> (psl.c). We are blatting DNA and use the same genome version.
> 
> blat -stepSize=5 -repMatch=1024 -minScore=0 -minIdentity=0 canFam2.2bit
> in.fa temp.psl
> (pslWithScores.psl has the score and a percent identity in its first two
> columns and in temp.psl you will find the output file from the local blat
> search).
> 
> When we blat the sequence attached (in.fa) we get the best score 597, while
> the web-tool's best hit has the score 106. As you can see, we also get this
> hit (so the computation of the score is correct), but we also get a lot more
> (and not few better ones). Is it something we need to change about the
> parameters or do we need to filter the results afterwards?
> 
> Here is an example:
> 
> 138    22    0    0    5    349    5    228    +    Emx2os_Human    7282
> 3898    4407    chr24    50763139    31027810    31028198    7
> 66,27,8,10,7,8,34,    3898,3977,4014,4038,4048,4358,4373,
> 31027810,31027890,31027925,31028034,31028107,31028156,31028164,
> This is your best match (according to the online tool) as is in the psl
> output file, which scores 106.
> 
> 774    121    0    0    27    3588    29    60189    -    Emx2os_Human
> 7282    69    4552    chr28    44191819    30945073    31006157    33
> 8,4,17,8,32,29,51,12,22,95,17,15,31,8,38,6,6,36,38,6,16,39,10,7,24,21,22,67,10,22,144,28,6,
> 2730,2744,2749,2766,2780,2889,3578,3683,3703,3730,3950,3977,3999,4060,4068,4107,4113,4537,4729,4774,4781,6119,6164,6232,6246,6352,6373,6788,6943,6966,6992,7179,7207,
> 30945073,30945094,30945098,30945116,30945129,30945237,30945598,30945689,30945719,30945742,30945974,30946002,30946023,30946079,30946098,30946136,30946143,30946252,30946463,30946509,30946520,30946607,30946651,30946708,30946719,30946823,30946845,31005736,31005884,31005908,31005930,31006116,31006151,
> This is the highest scoring hit we got. The score is 597, but between this
> one and 106 there are many better scores. Why do you not report this hits?
> 
> Kind regards,
> ilinca tudose
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to