Thanks for the advice. Nevertheless I am in no position to decide what pieces of software the cluster will run, I just have to deal with what I have, but anyway I can suggest other possibilities.
2009/3/4, Vincent Schut <[email protected]>: > John Barham wrote: > > > On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera <[email protected]> wrote: > > > > > > > I have to launch many tasks running in parallel (~5000) in a > > > cluster running linux. Each of the task performs some astronomical > > > calculations and I am not pretty sure if using fork is the best answer > > > here. > > > First of all, all the programming is done in python and c... > > > > > > > Take a look at the multiprocessing package > > (http://docs.python.org/library/multiprocessing.html), > newly > > introduced with Python 2.6 and 3.0: > > > > "multiprocessing is a package that supports spawning processes using > > an API similar to the threading module. The multiprocessing package > > offers both local and remote concurrency, effectively side-stepping > > the Global Interpreter Lock by using subprocesses instead of threads." > > > > It should be a quick and easy way to set up a cluster-wide job > > processing system (provided all your jobs are driven by Python). > > > > Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing > is geared towards multi-core systems (one machine), while pp is also > suitable for real clusters with more pc's. No special cluster software > needed. It will start (here's your fork) a (some) python interpreters on > each node, and then you can submit jobs to those 'workers'. The interpreters > are kept alive between jobs, so the startup penalty becomes neglectibly when > the number of jobs is large enough. > Using it here to process massive amounts of satellite data, works like a > charm. > > Vincent. > > > > > > It also looks like it's been (partially?) back-ported to Python 2.4 > > and 2.5: http://pypi.python.org/pypi/processing. > > > > John > > > > > > > > > -- Hugo
