Hi Avi - blat really is not the best tool for primate/rodent alignments. I'd suggest you switch to lastz from Penn State University. See http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.01.50/README.lastz-1.01.50.html .
On Mar 9, 2010, at 7:58 AM, Fungazid wrote: > Thank you Galt for your detailed information, > > I understand the optimal configuration depends the needs. So... my > query sequences are cDNAs of 100-5000bp. One of the goals is to > detect variations like intron retention between related mammals like > primates vs. rodents (therefore I need genomes as targets). > The basic configuration finds most but not all HSPs per hit > (accordingly sometimes small exons are not detected, or larger > intronic regions). But the optimization is problematic because I see > that often even -stepSize=5 is less sensitive than the default > stepSize. As far as I understand this can happen because of > repetitive sequences that are ignored if they occur too many times > when sensitivity rises. Should I increase -repMatch to prevent it ? > but which value is the program default repMatch for [-stepSize=5,- > tileSize=10] and for [-stepSize=5,-tileSize=default] ? > > thanks, > Avi > > > -repMatch > > --- On Mon, 3/8/10, Galt Barber <[email protected]> wrote: > >> From: Galt Barber <[email protected]> >> Subject: Re: [Genome] gfServer/gfClient and -tileSize >> To: [email protected] >> Date: Monday, March 8, 2010, 7:35 PM >> >> Higher tileSize increases memory, >> increases speed, decreases sensitivity slightly. >> >> The default tileSize 11 is very good. >> On rare occasions you see 10 or 12 used. >> Smaller tileSizes tend to lead to >> dramatically longer runtime. >> >> It's a little complex to state easily >> in a formula because there are multiple >> phases internally that have each different >> characteristics. >> >> The default stepSize is just tileSize. >> This means that you are sampling a >> position of the genome every stepSize bases. >> >> For PCR primer searching, we leave tileSize at 11 >> and lower stepSize to 5 for increased >> sensitivity. Of course this will also >> cause the runtime to grow. >> >> Increasing sensitivity means increasing >> the number of hits, and each hit that >> has to be explored can take a lot of >> processing. >> >> And of course, whatever generalizations >> one would make, the real power, speed, >> and memory-required will depend >> on the characteristics of the genome, >> the queries. Not to mention several command-line >> switches that are available. >> >> But luckily the defaults have good >> performance and sensitivity >> for a wide-range of applications. >> >> If you are doing short-reads then >> perhaps one of the many good freely >> available short-read aligners like >> would be useful. >> >> BLAT is free for non-commercial use. >> >> -Galt >> >> Ar 3/8/2010 7:03 AM, scríobh Fungazid: >>> Hello people, >>> >>> >>> About gfServer/gfClient : >>> >>> I see that higher -tileSize leads to higher memory >> requirement. Does higher -tileSize expected to decrease >> detection power ? >>> In addition, should higher -tileSize enhance the speed >> of gfServer/gfClient ? >>> >>> And, what is the -stepSize and how it effects the >> detection power, speed and memory requirement ? >>> >>> >>> Thanks, >>> Avi >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > > > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
