>> -stepSize=5 is less sensitive than the default stepSize.
This does not seem generally true. Of course it may be that blat
sees many new things at stepSize 5 compared to 11,
but misses a few old things that it used to see.
It is after all sampling every 5th position of the target
genome instead of every 11th position. That is all.
In general, blat is good for cDna and RNA of the size you mentioned
(100-500bp). However, as Jim pointed out, as the %Identity drops
over greater evolutionary distance, it's harder for BLAT to find
the exact tile hits which reduces its sensitivity. Lastz tends to do
better for human-rodent distances or greater.
You can try various things to increase BLAT's sensitivity,
but you may find that the speed runs much slower at high-sensitivity
settings. This could make it 10x to 100x slower than the default.
Certainly setting -repMatch higher may help with borderline repetitive
regions, but again at a time cost.
Here is the default formula for repMatch:
repMatch = 1024 * (tileSize/stepSize).
You can increase it from there.
You might also run it with or without -fine
and see if that helps you get more exons.
You could also try these.
-oneOff=N If set to 1 this allows one mismatch in tile and still
triggers an alignments. Default is 0.
-minMatch=N sets the number of tile matches. Usually set from 2 to 4
Default is 2 for nucleotide, 1 for protein.
-maxGap=N sets the size of maximum gap between tiles in a clump.
Usually set from 0 to 3. Default is 2.
Only relevent for minMatch > 1.
As noted before, extra sensitivity runs slower:
oneOff=1
minMatch=1
minMatch=2 maxGap=3
-Galt
Ar 3/9/2010 7:59 AM, scríobh Fungazid:
> Thanks Jim,
>
> I am looking into LASTZ and will try replace or combine it in my script.
I need to see if it is faster enough for large-scale search with my
computer
and if it can be used and parsed like Blast and Blat.
still, at this point, trying to optimize Blat could be helpful for me
because it tends to find most hits.
>
> Avi
>
> --- On Tue, 3/9/10, Jim Kent<[email protected]> wrote:
>
>> From: Jim Kent<[email protected]>
>> Subject: Re: [Genome] gfServer/gfClient and -tileSize
>> To: "Fungazid"<[email protected]>
>> Cc: [email protected], [email protected]
>> Date: Tuesday, March 9, 2010, 3:46 PM
>> Hi Avi - blat really is not the best
>> tool for primate/rodent alignments. I'd suggest you
>> switch to lastz from Penn State University. See
>> http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.01.50/README.lastz-1.01.50.html.
>>
>>
>>
>> On Mar 9, 2010, at 7:58 AM, Fungazid wrote:
>>
>>> Thank you Galt for your detailed information,
>>>
>>> I understand the optimal configuration depends the
>> needs. So... my query sequences are cDNAs of 100-5000bp. One
>> of the goals is to detect variations like intron retention
>> between related mammals like primates vs. rodents (therefore
>> I need genomes as targets).
>>> The basic configuration finds most but not all HSPs
>> per hit (accordingly sometimes small exons are not detected,
>> or larger intronic regions). But the optimization is
>> problematic because I see that often even -stepSize=5 is
>> less sensitive than the default stepSize. As far as I
>> understand this can happen because of repetitive sequences
>> that are ignored if they occur too many times when
>> sensitivity rises. Should I increase -repMatch to prevent it
>> ? but which value is the program default repMatch for
>> [-stepSize=5,-tileSize=10] and for
>> [-stepSize=5,-tileSize=default] ?
>>>
>>> thanks,
>>> Avi
>>>
>>>
>>> -repMatch
>>>
>>> --- On Mon, 3/8/10, Galt Barber<[email protected]>
>> wrote:
>>>
>>>> From: Galt Barber<[email protected]>
>>>> Subject: Re: [Genome] gfServer/gfClient and
>> -tileSize
>>>> To: [email protected]
>>>> Date: Monday, March 8, 2010, 7:35 PM
>>>>
>>>> Higher tileSize increases memory,
>>>> increases speed, decreases sensitivity slightly.
>>>>
>>>> The default tileSize 11 is very good.
>>>> On rare occasions you see 10 or 12 used.
>>>> Smaller tileSizes tend to lead to
>>>> dramatically longer runtime.
>>>>
>>>> It's a little complex to state easily
>>>> in a formula because there are multiple
>>>> phases internally that have each different
>>>> characteristics.
>>>>
>>>> The default stepSize is just tileSize.
>>>> This means that you are sampling a
>>>> position of the genome every stepSize bases.
>>>>
>>>> For PCR primer searching, we leave tileSize at 11
>>>> and lower stepSize to 5 for increased
>>>> sensitivity. Of course this will also
>>>> cause the runtime to grow.
>>>>
>>>> Increasing sensitivity means increasing
>>>> the number of hits, and each hit that
>>>> has to be explored can take a lot of
>>>> processing.
>>>>
>>>> And of course, whatever generalizations
>>>> one would make, the real power, speed,
>>>> and memory-required will depend
>>>> on the characteristics of the genome,
>>>> the queries. Not to mention several
>> command-line
>>>> switches that are available.
>>>>
>>>> But luckily the defaults have good
>>>> performance and sensitivity
>>>> for a wide-range of applications.
>>>>
>>>> If you are doing short-reads then
>>>> perhaps one of the many good freely
>>>> available short-read aligners like
>>>> would be useful.
>>>>
>>>> BLAT is free for non-commercial use.
>>>>
>>>> -Galt
>>>>
>>>> Ar 3/8/2010 7:03 AM, scríobh Fungazid:
>>>>> Hello people,
>>>>>
>>>>>
>>>>> About gfServer/gfClient :
>>>>>
>>>>> I see that higher -tileSize leads to higher
>> memory
>>>> requirement. Does higher -tileSize expected to
>> decrease
>>>> detection power ?
>>>>> In addition, should higher -tileSize enhance
>> the speed
>>>> of gfServer/gfClient ?
>>>>>
>>>>> And, what is the -stepSize and how it effects
>> the
>>>> detection power, speed and memory requirement ?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Avi
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>> _______________________________________________
>>>>> Genome maillist - [email protected]
>>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>> _______________________________________________
>>>> Genome maillist - [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Genome maillist - [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>>
>
>
>
>
>
> _______________________________________________
> Genome maillist - [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome