On Thu, Feb 22, 2001 at 12:31:34PM -0500, Camm Maguire wrote: > > Greetings! Subject says it all. Can we learn/implement any > techniques from Scyld for Debian Beowulf clusters? >
From what I've read on the same mailing lists I understand that Scyld are using either a modified PBS or OpenPBS. We don't have packages of it. At the moment OpenPBS is very non-free, but has an expiration clause that eliminates most or all of the non-free clauses in December (subject to a debate on debian-legal about the advertising clause). What we have right now is DQS, which is slightly non-free (no commercial distribution) and on the ropes development-wise (FSU no longer supports but does not wish to change the license). I and a few other people are considering setting up a sourceforge project to keep it going, but no one really wants to put a lot of time into it without a free software license. I've let the author (who has been trying to get a license change for years now) know that we'd like to have the commercial distribution clause dropped so that DQS can go into main, but he hasn't had any luck yet and is out in the real world now. I'm not sure that DQS was ever suitable for very large clusters, and has migration difficulties between releases due to the design of it's internode communication protocol (not that I know PBS is any better this way). Two DFSG-free alternatives that have some significant disadvantages are GNQS (POSIX, orphaned for years upstream, and never suitable for distributed parallel jobs) and GNU queue (non-POSIX, relocation may conflict with conventional distributed parallel jobs, scaleable?). Many programs implement their own private single-purpose queueing systems. It would be much better if we could provide several alternate standard queueing systems and modify these programs to use the standard system directly. For instance, you might run seti at low priority in the queueing system (requeueing on completion), rip CDs locally but do the encoding out on the cluster, perform daily system management (the slocate scans for instance) and all other resource intensive tasks in a more efficient serialized-per-node manner, rather than all fighting over the disk head positions between competing little queueing systems trying to run their pet tasks simultaneously. If we could hack together a simple, tiny, robust single-node posix compatible queueing system we could ship that standard (or even essential) and make all the other cluster queueing systems drop in replacements for it. Another area that we are weak is in network filesystems, but then everyone is as far as I know so we aren't really behind the 8-ball. It would be nice to be out in front with solid well documented easily configured DFS/AFS/Coda/Intermezzo/Mosix systems instead of just grotty old NFS. Does Scyld have anything new there?

