To run multiple blat processes in parallel you need to break your target genome up into smaller pieces. It would be very inefficient to run your queries each against the entire genome 2bit file. At least to single chromosomes for a target, even better to smaller pieces. There are several ways to partition your target pieces. faSplit is the simplest.
Same goes for your query sequences if they happen to be very large. Typically query sequences are already pretty small since they are usually things such as cDNAs. If you are trying to run genome to genome alignments, better to use lastz instead of blat. http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html If you need controlling software for your super computer, parasol is a simple system to manage multiple nodes and individual CPUs. http://users.soe.ucsc.edu/~donnak/eng/parasol.htm --Hiram ----- Original Message ----- From: "Assaf Gordon" <[email protected]> To: [email protected] Sent: Monday, July 5, 2010 12:27:41 PM GMT -08:00 US/Canada Pacific Subject: [Genome] Running BLATs in parallel Hello, A while ago there was a mention of running BLATs in parallel on a supercomputer ( https://lists.soe.ucsc.edu/pipermail/genome/2010-June/022692.html ). I'd like to ask, if possible, what is the method you're using to run BLAT in parallel ? Are you running multiple BLAT instances (on same node/multiple cores, or multiple nodes), or is it some gfServer/gfClient configuration ? Specifically, I'm wondering about memory usage: If I run multiple BLAT processes on a single machine (with same parameters), the 2bit database will get loaded multiple times and consume a lot of memory. Any hint or advice will be appreciated, thanks, -gordon _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
