Thanks Galt, This is very helpful. Thanks also for the suggestion of LASTZ which I see is also a very nice tool
Avi --- On Tue, 3/9/10, Galt Barber <[email protected]> wrote: > From: Galt Barber <[email protected]> > Subject: Re: [Genome] gfServer/gfClient and -tileSize > To: [email protected] > Date: Tuesday, March 9, 2010, 8:44 PM > > >> -stepSize=5 is less sensitive than the default > stepSize. > > This does not seem generally true. Of course it may > be that blat > sees many new things at stepSize 5 compared to 11, > but misses a few old things that it used to see. > It is after all sampling every 5th position of the target > genome instead of every 11th position. That is all. > > In general, blat is good for cDna and RNA of the size you > mentioned > (100-500bp). However, as Jim pointed out, as the > %Identity drops > over greater evolutionary distance, it's harder for BLAT to > find > the exact tile hits which reduces its sensitivity. > Lastz tends to do > better for human-rodent distances or greater. > > You can try various things to increase BLAT's sensitivity, > but you may find that the speed runs much slower at > high-sensitivity > settings. This could make it 10x to 100x slower than > the default. > > Certainly setting -repMatch higher may help with borderline > repetitive > regions, but again at a time cost. > > Here is the default formula for repMatch: > repMatch = 1024 * (tileSize/stepSize). > You can increase it from there. > > You might also run it with or without -fine > and see if that helps you get more exons. > > You could also try these. > > -oneOff=N If set to 1 this allows one > mismatch in tile and still > > triggers an alignments. Default is 0. > > -minMatch=N sets the number of tile matches. Usually > set from 2 to 4 > > Default is 2 for nucleotide, 1 for protein. > > -maxGap=N sets the size of maximum gap between tiles in a > clump. > > Usually set from 0 to 3. Default is > 2. > > Only relevent for minMatch > 1. > > As noted before, extra sensitivity runs slower: > oneOff=1 > minMatch=1 > minMatch=2 maxGap=3 > > -Galt > > Ar 3/9/2010 7:59 AM, scríobh Fungazid: > > Thanks Jim, > > > > I am looking into LASTZ and will try replace or > combine it in my script. > > I need to see if it is faster enough for large-scale search > with my > computer > > and if it can be used and parsed like Blast and Blat. > > still, at this point, trying to optimize Blat could be > helpful for me > > because it tends to find most hits. > > > > Avi > > > > > > --- On Tue, 3/9/10, Jim Kent<[email protected]> > wrote: > > > >> From: Jim Kent<[email protected]> > >> Subject: Re: [Genome] gfServer/gfClient and > -tileSize > >> To: "Fungazid"<[email protected]> > >> Cc: [email protected], > [email protected] > >> Date: Tuesday, March 9, 2010, 3:46 PM > >> Hi Avi - blat really is not the best > >> tool for primate/rodent alignments. I'd > suggest you > >> switch to lastz from Penn State University. > See > >> http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.01.50/README.lastz-1.01.50.html. > >> > >> > >> > >> On Mar 9, 2010, at 7:58 AM, Fungazid wrote: > >> > >>> Thank you Galt for your detailed information, > >>> > >>> I understand the optimal configuration depends > the > >> needs. So... my query sequences are cDNAs of > 100-5000bp. One > >> of the goals is to detect variations like intron > retention > >> between related mammals like primates vs. rodents > (therefore > >> I need genomes as targets). > >>> The basic configuration finds most but not all > HSPs > >> per hit (accordingly sometimes small exons are not > detected, > >> or larger intronic regions). But the optimization > is > >> problematic because I see that often even > -stepSize=5 is > >> less sensitive than the default stepSize. As far > as I > >> understand this can happen because of repetitive > sequences > >> that are ignored if they occur too many times > when > >> sensitivity rises. Should I increase -repMatch to > prevent it > >> ? but which value is the program default repMatch > for > >> [-stepSize=5,-tileSize=10] and for > >> [-stepSize=5,-tileSize=default] ? > >>> > >>> thanks, > >>> Avi > >>> > >>> > >>> -repMatch > >>> > >>> --- On Mon, 3/8/10, Galt Barber<[email protected]> > >> wrote: > >>> > >>>> From: Galt Barber<[email protected]> > >>>> Subject: Re: [Genome] gfServer/gfClient > and > >> -tileSize > >>>> To: [email protected] > >>>> Date: Monday, March 8, 2010, 7:35 PM > >>>> > >>>> Higher tileSize increases memory, > >>>> increases speed, decreases sensitivity > slightly. > >>>> > >>>> The default tileSize 11 is very good. > >>>> On rare occasions you see 10 or 12 used. > >>>> Smaller tileSizes tend to lead to > >>>> dramatically longer runtime. > >>>> > >>>> It's a little complex to state easily > >>>> in a formula because there are multiple > >>>> phases internally that have each > different > >>>> characteristics. > >>>> > >>>> The default stepSize is just tileSize. > >>>> This means that you are sampling a > >>>> position of the genome every stepSize > bases. > >>>> > >>>> For PCR primer searching, we leave > tileSize at 11 > >>>> and lower stepSize to 5 for increased > >>>> sensitivity. Of course this will > also > >>>> cause the runtime to grow. > >>>> > >>>> Increasing sensitivity means increasing > >>>> the number of hits, and each hit that > >>>> has to be explored can take a lot of > >>>> processing. > >>>> > >>>> And of course, whatever generalizations > >>>> one would make, the real power, speed, > >>>> and memory-required will depend > >>>> on the characteristics of the genome, > >>>> the queries. Not to mention several > >> command-line > >>>> switches that are available. > >>>> > >>>> But luckily the defaults have good > >>>> performance and sensitivity > >>>> for a wide-range of applications. > >>>> > >>>> If you are doing short-reads then > >>>> perhaps one of the many good freely > >>>> available short-read aligners like > >>>> would be useful. > >>>> > >>>> BLAT is free for non-commercial use. > >>>> > >>>> -Galt > >>>> > >>>> Ar 3/8/2010 7:03 AM, scríobh Fungazid: > >>>>> Hello people, > >>>>> > >>>>> > >>>>> About gfServer/gfClient : > >>>>> > >>>>> I see that higher -tileSize leads to > higher > >> memory > >>>> requirement. Does higher -tileSize > expected to > >> decrease > >>>> detection power ? > >>>>> In addition, should higher -tileSize > enhance > >> the speed > >>>> of gfServer/gfClient ? > >>>>> > >>>>> And, what is the -stepSize and how it > effects > >> the > >>>> detection power, speed and memory > requirement ? > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Avi > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >> _______________________________________________ > >>>>> Genome maillist - [email protected] > >>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome > >>>> > >>>> > _______________________________________________ > >>>> Genome maillist - [email protected] > >>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome > >>>> > >>> > >>> > >>> > >>> > >>> > >>> > _______________________________________________ > >>> Genome maillist - [email protected] > >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome > >> > >> > > > > > > > > > > > > _______________________________________________ > > Genome maillist - [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
