Hey Guys,

I was at Washington University and saw there were going to trash a 5  
node computer cluster and thought ByteWorks might want to use it.   
Below is a blog about the setup by the person who set it up.  It's a 5  
node cluster with the Coppermine processor.  A lot of this stuff I  
don't understand.  Let me know what you think.


Klippa - 5 node cluster, each node is 2x866MHz PIII Coppermine with  
1.5GB RAM.

Trying to install OpenMosix on head node, pushed out to others by
DHCP/TFTP, autodiscovery of OpenMosix nodes.

1.  Install Debian Sarge using 2.4 kernel (2.4.27).
2.  Get vanilla 2.4.26 kernel. Patch with OpenMosix from 
http://openmosix.sf.net 
.
3.  Compile kernel, include certain modules as specified at
http://www.gentoo.org/doc/en/diskless-howto.xml and at
http://www.gentoo.org/doc/en/openmosix-howto.xml. Do not use --initrd,
this just complicates matters, and we're trying to compile in all the
drivers we'll need. (Previously, tried to use initrd, and had this
here: Use --initrd option, but will have to change to EXT2 initrd
because only Debian patches have cramfs patch - use directions and
script at http://linuxmafia.com/faq/Debian/mkext2initrd.html.)
4.  Install tftp-ha, nfs-kernel-server, dhcp3-server, bind9, squid, pxe,
syslinux, ash, mknbi, dialog, pump, cloop-src packages. Some of these
are so that the clusterknoppix script won't complain. Since we're not
using clusterknoppix now, though, probably don't need mknbi, pump, and
cloop-src.
5.  Compile cloop-src using make-kpkg --initrd --append-to-version  
-060220
modules_image. This gives an error when creating the .deb file, but it
seems to compile ok. Actually the cloop-src that comes with sarge is
version 2.01.5-4, but this gives some other compile error. I used the
version that comes with etch, 2.02.1+eb.10. This compiles ok but
doesn't make the deb as above, but just copy the module to
/lib/modules/2.4.26-om1-060220/kernel/drivers/extra (have to make the
extra directory), depmod, then modprobe cloop works ok. (This isn't
needed any more when using diskless as in the rest of the steps.)
6.  Get userspace utilities from http://openmosix.sf.net. Download the
rpm, install the alien package to convert it to a deb, and install it.
Link /etc/init.d/openmosix to /etc/rc2.d/S99openmosix. Edit
/etc/openmosix/openmosix.config to use autodiscovery and to use eth1
for the autodiscovery daemon.
7.  Install diskless package. Follow docs at
http://www.wlug.org.nz/NFSRoot basically as they had them there.
8.  Most of the packages are already installed and close to set up, and
the kernel is pretty much ready. Make /tmp/nfsroot, run
diskless-createbasetgz /tmp/nfsroot/ sarge
http://mirrors.kernel.org/debian /tmp/base.tgz.
9.  Download diskless-image-simple deb from
http://mirrors.kernel.org/debian/pool/d/diskless directory, make sure
to get the version that matches the diskless package (0.3.18.0.5). Put
it in /tmp.
10.  Run diskless-newimage, pick reasonable values like klippa for the
master server and mail server, etc. Mostly take the defaults.
11.  Clean up the install after doing a chroot
/var/lib/diskless/default/root. Do a base-config, configure apt, add
the contrib and non-free sources to the main sources in
/etc/apt/sources.list, update packages, make sure to install devfsd.
Exit the chroot.
12.  Copy the openmosix userspace utilities deb and the openmosix custom
kernel to /var/lib/diskless/default/root/root. Chroot back into
/var/lib/diskless/default/root, then install those debs. Link
/etc/init.d/openmosix to /etc/rc2.d/S99openmosix so that it starts on
boot. The docs at that www.wlug.org.nz page suggest editing the /etc
config files, but I didn't need to change anything else. Exit the
chroot.
13.  Run diskless-newhost /var/lib/diskless/default/root 192.168.1.2.  
Enter
hostname (klippa2) and mail server (klippa), then it copies a bunch of
files. Do the same for 192.168.1.3-5.
14.  Make a /tftpboot directory. Copy the openmosix kernel image there.
Copy /usr/lib/syslinux/pxelinux.0 there. Make a pxelinux.cfg
directory, and make a default file in there that follows the example
on www.wlug.org.nz but changes the ip address and kernel image
filename. Make sure inetd.conf is set up right to point to the
/tftpboot directory, Debian defaults to /var/lib/tftpboot.
15.  Set up DHCP configuration file as in the gentoo pages above, except
change it so that it's not so restrictive and I don't have to edit it
every time there's a new host. Basically, do not have per-host blocks
which assign a specific IP address to a specific MAC address. Instead,
set the pool block to the number of IP addresses I need (range
192.168.1.2 192.168.1.5;), put the routers/domain servers in there
(option routers 192.168.1.1;, then option domain-name-servers
192.168.1.1;, then option domain-name "wustl.edu";), and comment out
the deny unknown-clients. This will just assign the address pool that
I've set up the newhost diskless filesystems for, and whoever comes up
with a given IP address will just get that file system - they're the
same anyway.
16.  Set up NFS, export the filesystems as on the www.wlug.org.nz page,
changing the IP addresses and using the NFS options from the gentoo
pages (sync,rw,no_root_squash,no_all_squash). Restart the nfs server.
17.  Boot the clients, and everything should come up and there is now a
5-node cluster shown by openmosixview from the main node and testing
with a little awk script from the openmosix howto.
18.  Adding new nodes should involve increasing the IP address pool  
range
in /etc/dhcp3/dhcpd.conf, running diskless-newhost for the new IP
addresses, and changing the per-IP exports in /etc/exports. Restart
the dhcp and nfs servers, and it should go.
19.  Lots of extra configuration needed, basically chroot into
/var/lib/diskless/default/root and dselect to install stuff, then edit
the files in /etc in the chroot. Need to set up lo interface in
/etc/network/interfaces, for example, otherwise a lot of stuff didn't
work.
20.  On head node need to set up IP masquerading, edit /etc/network/ 
options
to turn on ip_forward, add iptable_nat to /etc/modules, add an
S99masquerade script to /etc/rc2.d which has iptables -t nat -A
POSTROUTING -o eth0 -j MASQUERADE.
21.  IP masquerading on the head node allows NFS mounting through the  
head
node to the outside network (128.252.171.0 for us). Edit the template
fstab file to put in sh-pod00's IP address for mounting /home and
/usr/local at /var/lib/diskless/default/root/usr/lib/diskless-image/ 
template/etc/fstab.
22.  The debian lam4 and lam-runtime packages use shared memory which
prevents openmosix from migrating their processes. So download the
lam-mpi source and compile and install it, then this works. Follow
instructions at
http://howto.ipng.be/openMosixWiki/index.php/Using%20LAM-MPI%20with%20openMosix
23.  Forcing an install of the debian clustalw-mpi package doesn't work,
since the program looks for shared libraries. Have to recompile it
also, then it runs fine and aligns a fimH sequence file in 23 minutes
over 10 CPUs where my 3GHz P4 does it in about 60-70 minutes.
Downloaded from http://web.bii.a-star.edu.sg/~kuobin/clustalg/
24.  The debian ncbi packages don't seem to have what mpiblast wants. So
download and install these from ftp.ncbi.nih.gov/toolbox/ncbi_tools.
Need the old version, follow directions at
http://mpiblast.lanl.gov/Docs.Install.html. Patch the toolbox, then
compile them. These went into /usr/src/ncbi-toolbox/ncbi. Then
configure and compile mpiblast. For the nodes to run blast, they all
need the ncbi data files, so chroot into
/var/lib/diskless/default/root again, dselect and install blast2,
which pulls in the ncbi libraries/tools needed, then exit the chroot
and regenerate the filesystems with diskless-newhost.
25.  I kind of want mfs to have local storage. This was removed from the
2.4.26 openMosix patch, so go back to 2.4.24. Download the vanilla
kernel source, get the openMosix patch, apply it, make oldconfig from
the 2.4.26 config file, enable mfs, recompile and install. Reboot to
make sure it works, then copy the kernel image to /tftpboot/vmlinuz,
copy /lib/modules/2.4.24-om2-060307 to
/var/lib/diskless/default/root/lib/modules/2.4.24-om2-060307. Add mfs
mount line to /etc/fstab and to
/var/lib/diskless/default/root/usr/lib/diskless-image/template/etc/ 
fstab.
Make the /mfs directory. mount -a on the head node, sync all the
diskless images, and reboot the nodes. They come up into the cluster
ok and have /mfs mounted. I'm not sure this is truly local, though,
since the root directory on each node is nfs mounted.
26.  I'm not sure mfs truly has local access, though, since the root
directory on each node is nfs mounted. Make a /local directory local
to each node. Make a /local and /local/mfs in the root of the head
node, then under /var/lib/diskless/default/root. Turns out swap is not
turned on, so add a line to mount swap from /dev/hda2 on each node
then add a line to mount /dev/hda1 to /local and mfs_mnt to /local/mfs
on each node in
/var/lib/diskless/default/root/usr/lib/diskless-image/template/etc/ 
fstab.


A previous attempt tried to leverage the clusterknoppix stuff, the
following steps went in after installing the userspace openmosix
utilities but I couldn't get it to work.

1.  Get the clusterknoppix cd, and copy over the cd which you see when  
you
mount it, and the cd image which you can get from mounting the
/cdrom/KNOPPIX/KNOPPIX file as a compressed loop device (cloop). Copy
these to /mnt/knoppix-cd (mounted cdrom, has /boot and /KNOPPIX
directories) and /mnt/knoppix-image (has a normal looking root
filesystem).
2.  Link /mnt/knoppix-image/bin/ash.static to /bin. Link
/mnt/knoppix-image/usr/share/knoppix-terminalserver to /usr/share.
Link /mnt/knoppix-image/usr/share/knoppix-terminalopenmosixserver to
/usr/share.
3.  Modify the /mnt/knoppix-image/usr/sbin/knoppix- 
terminalopenmosixserver
script to mount /mnt/knoppix-cd instead of /cdrom.
4.  Grab the openmosixview RedHat 9.0 rpm, use alien to convert, install
the .deb. Need to install libqt3c102-mt, xserver-common,
xbase-clients, and all their dependencies to run this.
I also tried using the lessdisks and initrd-netboot-tools. These
didn't seem to work so well for me.


[Non-text portions of this message have been removed]

Reply via email to