Dan, > I am very envious of your cluster, each node sounds quite powerful, much > more powerful than my severly dated machines. However, we must all start > somewhere and hopefully this will appeal to the 'powers that be' and > also be useful for the undergrads for initial steps into leaarning > parallel programming concepts.
This is more or less what happened here. We started with a small prototype and the idea caught on. We used a bunch of ten 500 MHz Alphas (with Linux) before that, in a more traditional setting, with disks, no remote boot. At some point we took the disks off, because they gave us continuous trouble, and started using NFSroot, that was a big improvement, but we still booted from floppy (two, in fact, one for MILO, one for the kernel). Recently we deactivated them here in the Department and passed them on to the computer center of the Institute because they gave us too much administration work. It's still a good machine, they have the manpower to handle it, we don't. So the process happened in several stages. The nodes I describes are from our main machine, which currently has 21 nodes. There are a few others in this University using 800 MHz and 1000 MHz nodes. Currently we can buy one of those nodes for a bit less than US$ 500, so our 21 nodes total about US$ 10,000. If you add the 24-port 100 Mbps 3COM switch, US$ 2,500 for a nice server and some more infrastructure items, the whole thing comes up to something like US$ 15,000. This looks like a lot of money to a person (well, to me it does |:-) but it is peanuts compared with what used to be spent here for the purchase of considerably _less_ computer power. The traditional approach in universities is big brand-name boxes, you know. > I am in the process of creating my node kernel as we speak, I am using the > make-kpkg --append_to_version diskless buildpackage > > I have been doing alot of digging, and I am forcing myself to do things > the right way the first time. From what I have managed to dig up, the > make-kpkg tool is the best for a debian system. I know I will not be > working on this machine forever and I want to make a new admin > (hopefully an undergrad student) as comfortable as possible, with as > much documentation as possible. > > I assume you are using debian, and did you use this make-kpkg process? I > know it's a one time thing, and once set up is not neccessary to modify, > but if you recall what was done, it would be helpful to me. I think this is the utility in the kernel-package package, right? Well, we do use Debian exclusively here, but the one thing we do not use Debian for is compiling the kernel. We always compile our kernels directly from the original sources. It is easy, instructive, and fun! When we built our fist node we started by compiling a kernel for it on the server and booting it from floppy. First we got the NFSroot part to work, then worried about network booting. For that, we took our floppy kernel, run the Etherboot mknbi-linux on it and installed it in the tftpboot directory. At fist we encoded all the NFSroot boot parameter information into the NBI boot block of the kernel, later we switched to using DHCP to do this, which is a much better way (centralized) to manage the whole thing. If fact, I recorded the whole experience in a howto which is available online, but it has 2 problems, a small one and a big one: the small one is that it is a bit dated, about a year old, and there are many improvements we are already using which are not mentioned there; I've been meaning to write a new version of it, but where is the time...; the big one is that it is in Portuguese, it was meant for national consumption here; I have considered translating it but where is the time for that. I even started writing a semi-automatic translation tool that could be used for this, but again, no time to finish it. Anyhow, here is the address if anyone out there knows enough Spanish to be able to make head and tail of it: http://latt.if.usp.br/pmc/ It is rather extensively documented, including libraries of configuration files, scripts, diagrams, etc. I even have some snapshots of an early version of the machines, but they are not yet online. I am willing to explain our whole strategy and architecture is people are interested, but if I just jump into this these messages are bound to get unbearably long. So I will give a general explanation of some of the basics in the answer to the next message, and take it from there. Cheers, ---------------------------------------------------------------- Jorge L. deLyra, Associate Professor of Physics The University of Sao Paulo, IFUSP-DFMA For more information: finger [EMAIL PROTECTED] ----------------------------------------------------------------

