On Wed, Oct 18, 2000 at 03:56:08PM +0200, Thimo Neubauer wrote: > Hello, > > we are running a 80PC cluster and we plan to use some sort of queuing > system to run the parallel programs. What are the known free systems > and what are your experiences? > > The second question is, if anyone on this list already used the > SCore-system? Does it work with Debian? Is anyone packaging it?
I've tried various free systems. GNU queue looks nice in theory, but is lacking in functionality. It also had a serious bug on Linux systems which caused it to be removed from Debian 2.2. NQS derivatives are all hopeless, IMHO. They perform host selection at job submission time rather than job execution time, which is a really broken idea. They're fine for controlling queues on a single large parallel machine, but they're no good for managing clusters. I bit the bullet and bought Platform Computing's LSF. It's got one or two bugs, but you get what you pay for - the technical support is first rate, and the feature list is great. It's particularly good if you want to cycle-steal from workstations on peoples desks; you can configure machines to be part of the cluster only at certain times of day, or only when there are no users logged in, or several other load index measures. You can even add in your own. It handles heterogeneous networks beautifully; I run a mixed Solaris/Linux cluster here. The base product does not support launching MPI jobs sensibly, but there is an add-on module (LSF Parallel) to do that. I realise some may think it's anathema to talk about commercial software in a Debian mailing list, but I've bought this software for three different groups of people, with widely varying requirements, over the last three years or so and have never regretted it. Tim.

