To run multiple blat processes in parallel you need to break your target
genome up into smaller pieces.  It would be very inefficient to run your
queries each against the entire genome 2bit file.  At least to single
chromosomes for a target, even better to smaller pieces.  There are
several ways to partition your target pieces.  faSplit is the
simplest.

Same goes for your query sequences if they happen to be very large.
Typically query sequences are already pretty small since they are
usually things such as cDNAs.

If you are trying to run genome to genome alignments, better
to use lastz instead of blat.
http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html

If you need controlling software for your super computer, parasol is
a simple system to manage multiple nodes and individual CPUs.
http://users.soe.ucsc.edu/~donnak/eng/parasol.htm

--Hiram

----- Original Message -----
From: "Assaf Gordon" <[email protected]>
To: [email protected]
Sent: Monday, July 5, 2010 12:27:41 PM GMT -08:00 US/Canada Pacific
Subject: [Genome] Running BLATs in parallel

Hello,

A while ago there was a mention of running BLATs in parallel on a supercomputer 
( https://lists.soe.ucsc.edu/pipermail/genome/2010-June/022692.html ).

I'd like to ask, if possible, what is the method you're using to run BLAT in 
parallel ?
Are you running multiple BLAT instances (on same node/multiple cores, or 
multiple nodes),
or is it some gfServer/gfClient configuration ?

Specifically,
I'm wondering about memory usage:
If I run multiple BLAT processes on a single machine (with same parameters), 
the 2bit database will get loaded multiple times and consume a lot of memory.

Any hint or advice will be appreciated,

thanks,
 -gordon

 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to