Hey Guys, I was at Washington University and saw there were going to trash a 5 node computer cluster and thought ByteWorks might want to use it. Below is a blog about the setup by the person who set it up. It's a 5 node cluster with the Coppermine processor. A lot of this stuff I don't understand. Let me know what you think.
Klippa - 5 node cluster, each node is 2x866MHz PIII Coppermine with 1.5GB RAM. Trying to install OpenMosix on head node, pushed out to others by DHCP/TFTP, autodiscovery of OpenMosix nodes. 1. Install Debian Sarge using 2.4 kernel (2.4.27). 2. Get vanilla 2.4.26 kernel. Patch with OpenMosix from http://openmosix.sf.net . 3. Compile kernel, include certain modules as specified at http://www.gentoo.org/doc/en/diskless-howto.xml and at http://www.gentoo.org/doc/en/openmosix-howto.xml. Do not use --initrd, this just complicates matters, and we're trying to compile in all the drivers we'll need. (Previously, tried to use initrd, and had this here: Use --initrd option, but will have to change to EXT2 initrd because only Debian patches have cramfs patch - use directions and script at http://linuxmafia.com/faq/Debian/mkext2initrd.html.) 4. Install tftp-ha, nfs-kernel-server, dhcp3-server, bind9, squid, pxe, syslinux, ash, mknbi, dialog, pump, cloop-src packages. Some of these are so that the clusterknoppix script won't complain. Since we're not using clusterknoppix now, though, probably don't need mknbi, pump, and cloop-src. 5. Compile cloop-src using make-kpkg --initrd --append-to-version -060220 modules_image. This gives an error when creating the .deb file, but it seems to compile ok. Actually the cloop-src that comes with sarge is version 2.01.5-4, but this gives some other compile error. I used the version that comes with etch, 2.02.1+eb.10. This compiles ok but doesn't make the deb as above, but just copy the module to /lib/modules/2.4.26-om1-060220/kernel/drivers/extra (have to make the extra directory), depmod, then modprobe cloop works ok. (This isn't needed any more when using diskless as in the rest of the steps.) 6. Get userspace utilities from http://openmosix.sf.net. Download the rpm, install the alien package to convert it to a deb, and install it. Link /etc/init.d/openmosix to /etc/rc2.d/S99openmosix. Edit /etc/openmosix/openmosix.config to use autodiscovery and to use eth1 for the autodiscovery daemon. 7. Install diskless package. Follow docs at http://www.wlug.org.nz/NFSRoot basically as they had them there. 8. Most of the packages are already installed and close to set up, and the kernel is pretty much ready. Make /tmp/nfsroot, run diskless-createbasetgz /tmp/nfsroot/ sarge http://mirrors.kernel.org/debian /tmp/base.tgz. 9. Download diskless-image-simple deb from http://mirrors.kernel.org/debian/pool/d/diskless directory, make sure to get the version that matches the diskless package (0.3.18.0.5). Put it in /tmp. 10. Run diskless-newimage, pick reasonable values like klippa for the master server and mail server, etc. Mostly take the defaults. 11. Clean up the install after doing a chroot /var/lib/diskless/default/root. Do a base-config, configure apt, add the contrib and non-free sources to the main sources in /etc/apt/sources.list, update packages, make sure to install devfsd. Exit the chroot. 12. Copy the openmosix userspace utilities deb and the openmosix custom kernel to /var/lib/diskless/default/root/root. Chroot back into /var/lib/diskless/default/root, then install those debs. Link /etc/init.d/openmosix to /etc/rc2.d/S99openmosix so that it starts on boot. The docs at that www.wlug.org.nz page suggest editing the /etc config files, but I didn't need to change anything else. Exit the chroot. 13. Run diskless-newhost /var/lib/diskless/default/root 192.168.1.2. Enter hostname (klippa2) and mail server (klippa), then it copies a bunch of files. Do the same for 192.168.1.3-5. 14. Make a /tftpboot directory. Copy the openmosix kernel image there. Copy /usr/lib/syslinux/pxelinux.0 there. Make a pxelinux.cfg directory, and make a default file in there that follows the example on www.wlug.org.nz but changes the ip address and kernel image filename. Make sure inetd.conf is set up right to point to the /tftpboot directory, Debian defaults to /var/lib/tftpboot. 15. Set up DHCP configuration file as in the gentoo pages above, except change it so that it's not so restrictive and I don't have to edit it every time there's a new host. Basically, do not have per-host blocks which assign a specific IP address to a specific MAC address. Instead, set the pool block to the number of IP addresses I need (range 192.168.1.2 192.168.1.5;), put the routers/domain servers in there (option routers 192.168.1.1;, then option domain-name-servers 192.168.1.1;, then option domain-name "wustl.edu";), and comment out the deny unknown-clients. This will just assign the address pool that I've set up the newhost diskless filesystems for, and whoever comes up with a given IP address will just get that file system - they're the same anyway. 16. Set up NFS, export the filesystems as on the www.wlug.org.nz page, changing the IP addresses and using the NFS options from the gentoo pages (sync,rw,no_root_squash,no_all_squash). Restart the nfs server. 17. Boot the clients, and everything should come up and there is now a 5-node cluster shown by openmosixview from the main node and testing with a little awk script from the openmosix howto. 18. Adding new nodes should involve increasing the IP address pool range in /etc/dhcp3/dhcpd.conf, running diskless-newhost for the new IP addresses, and changing the per-IP exports in /etc/exports. Restart the dhcp and nfs servers, and it should go. 19. Lots of extra configuration needed, basically chroot into /var/lib/diskless/default/root and dselect to install stuff, then edit the files in /etc in the chroot. Need to set up lo interface in /etc/network/interfaces, for example, otherwise a lot of stuff didn't work. 20. On head node need to set up IP masquerading, edit /etc/network/ options to turn on ip_forward, add iptable_nat to /etc/modules, add an S99masquerade script to /etc/rc2.d which has iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE. 21. IP masquerading on the head node allows NFS mounting through the head node to the outside network (128.252.171.0 for us). Edit the template fstab file to put in sh-pod00's IP address for mounting /home and /usr/local at /var/lib/diskless/default/root/usr/lib/diskless-image/ template/etc/fstab. 22. The debian lam4 and lam-runtime packages use shared memory which prevents openmosix from migrating their processes. So download the lam-mpi source and compile and install it, then this works. Follow instructions at http://howto.ipng.be/openMosixWiki/index.php/Using%20LAM-MPI%20with%20openMosix 23. Forcing an install of the debian clustalw-mpi package doesn't work, since the program looks for shared libraries. Have to recompile it also, then it runs fine and aligns a fimH sequence file in 23 minutes over 10 CPUs where my 3GHz P4 does it in about 60-70 minutes. Downloaded from http://web.bii.a-star.edu.sg/~kuobin/clustalg/ 24. The debian ncbi packages don't seem to have what mpiblast wants. So download and install these from ftp.ncbi.nih.gov/toolbox/ncbi_tools. Need the old version, follow directions at http://mpiblast.lanl.gov/Docs.Install.html. Patch the toolbox, then compile them. These went into /usr/src/ncbi-toolbox/ncbi. Then configure and compile mpiblast. For the nodes to run blast, they all need the ncbi data files, so chroot into /var/lib/diskless/default/root again, dselect and install blast2, which pulls in the ncbi libraries/tools needed, then exit the chroot and regenerate the filesystems with diskless-newhost. 25. I kind of want mfs to have local storage. This was removed from the 2.4.26 openMosix patch, so go back to 2.4.24. Download the vanilla kernel source, get the openMosix patch, apply it, make oldconfig from the 2.4.26 config file, enable mfs, recompile and install. Reboot to make sure it works, then copy the kernel image to /tftpboot/vmlinuz, copy /lib/modules/2.4.24-om2-060307 to /var/lib/diskless/default/root/lib/modules/2.4.24-om2-060307. Add mfs mount line to /etc/fstab and to /var/lib/diskless/default/root/usr/lib/diskless-image/template/etc/ fstab. Make the /mfs directory. mount -a on the head node, sync all the diskless images, and reboot the nodes. They come up into the cluster ok and have /mfs mounted. I'm not sure this is truly local, though, since the root directory on each node is nfs mounted. 26. I'm not sure mfs truly has local access, though, since the root directory on each node is nfs mounted. Make a /local directory local to each node. Make a /local and /local/mfs in the root of the head node, then under /var/lib/diskless/default/root. Turns out swap is not turned on, so add a line to mount swap from /dev/hda2 on each node then add a line to mount /dev/hda1 to /local and mfs_mnt to /local/mfs on each node in /var/lib/diskless/default/root/usr/lib/diskless-image/template/etc/ fstab. A previous attempt tried to leverage the clusterknoppix stuff, the following steps went in after installing the userspace openmosix utilities but I couldn't get it to work. 1. Get the clusterknoppix cd, and copy over the cd which you see when you mount it, and the cd image which you can get from mounting the /cdrom/KNOPPIX/KNOPPIX file as a compressed loop device (cloop). Copy these to /mnt/knoppix-cd (mounted cdrom, has /boot and /KNOPPIX directories) and /mnt/knoppix-image (has a normal looking root filesystem). 2. Link /mnt/knoppix-image/bin/ash.static to /bin. Link /mnt/knoppix-image/usr/share/knoppix-terminalserver to /usr/share. Link /mnt/knoppix-image/usr/share/knoppix-terminalopenmosixserver to /usr/share. 3. Modify the /mnt/knoppix-image/usr/sbin/knoppix- terminalopenmosixserver script to mount /mnt/knoppix-cd instead of /cdrom. 4. Grab the openmosixview RedHat 9.0 rpm, use alien to convert, install the .deb. Need to install libqt3c102-mt, xserver-common, xbase-clients, and all their dependencies to run this. I also tried using the lessdisks and initrd-netboot-tools. These didn't seem to work so well for me. [Non-text portions of this message have been removed]
