On Wed, Oct 27, 1999 at 09:21:56PM -0400, Greg Johnson wrote: > > Does anyone know anything about GNQS (Generic NQS)? I would like to use > it as a replacement for DQS (since DQS is non-free and GNQS is GPL'd), > but I found it confusing to set up. I'm also not sure if it can cope > with MPI/PVM jobs. > > In my opinion, we are really lacking a good free (DFSG) queuing system. > Perhaps I should look at GNU queue again. Last time I looked at it, it > wasn't really usable. >
I'm not sure there is a good free queueing system. When last I looked GNQS seemed to have strong support for parallel jobs on single-system-image machines (SMP's, NUMAs, etc.) but didn't handle parallel clusters well. >From the mailing lists GNQS code sounded fairly clean, so it might be easy to add necessary features. If you or someone else package GNQS I'll modify DQS to use alternatives for the Posix q* commands. Until I make the mods just conflict with dqs and put the q* commands in the new package. It doesn't make much sense to have multiple batch queueing systems on one box anyway. DQS under Debian uses ports 610,611, and 612. I have no objections to sharing with other batch systems that don't have IANA assigned ports either (since both shouldn't be running at once anyway). PBS has finally been released under a BSD-with-advertising-clause license. pbs.mrj.com. Never used it, but since NASA paid a lot of money to replace NQS with PBS presumably it's an improvement. Definitely worth a look now. GNU queue isn't remotely posix, but that may not be a bad thing. It supports interactive jobs unlike the Posix batch queueing systems (DQS, *NQS). I've never been fond of qsub and it's relatives. OTOH queue seemed to have no scheduling or accounting systems to speak of. I'm unsure how it would cope with a parallel job that spawned lots of children, they'd probably escape the queue and run unfettered by the non-existent scheduling system. PVM is going to be a problem for any clustering system due to overlap between PVM's "virtual machine" and the queueing system's view of a cluster. DQS tries (not very hard) to setup and take down a virtual machine to run each job in, but if you allow multiple jobs on any nodes (SMP's for instance) there will certainly be problems. If you run interactive PVM jobs as well DQS may find itself unable to setup the virtual machine at all, and it may try to take down your interactive VM when the job ends. To fix this right PVM needs to be redesigned. I have the impression that ORNL is doing just this with their next-generation projects. On stability, DQS-3.1.8 had big memory leaks in the master daemon. 3.2.7 has few leaks (none that I've noticed), though I still restart the master daemon daily just to be safe. In a big cluster you might need to restart qmaster more often. Look at /etc/cron.daily/dqs.

