Greetings,
I'm currently employed at a site with some Xserve G5's and a smattering
of PIII's.
I cannot comment on High Availability Clusters, but I'll be more then
willing to
discuss the HPC side of clusters.
Right now we primarily run OS X on the G5's, however work is in progress
to allow
job-submission time switching between OS X and Linux (Debian or Gentoo
currently,
others in the future possibly) based upon user-submitted requests.
As we run a variety of operating systems, I personally prefer to compile
the HPC-
orientated applications from source. Anyways, I noticed a request for
software
recommendations earlier in this thread, so here's a list of the first
things I
end up installing when we build a test/development cluster, along with
the versions I have running.
Torque (2.0.0p5)
Mpich (1.2.7)
Mpichgm (Myrinet support, based on 1.2.6 )
Mpiexec (0.80)
Atlas (3.7.11)
HPL (To test the install mainly)
We also find it nice to have server(s) providing:
LDAP
DHCP and related netbooting services. (We've written our own, highly
alpha stage right now).
NFS for home directories only. We've found numerous scalability
problems with diskless.
Of course the shameless plug for our MyPBS package is also required,
http://sourceforge.net/projects/my-pbs/
This is just a quick list of of what I think any documentation on a HPC
cluster needs to
cover at minimum. I'm by no means an expert, but I would like to offer
my help.
Hanni Ali wrote:
Ok,
I suggest we try to put the documentation together on gentoo-wiki.com
<http://gentoo-wiki.com> I've always found this site an excellent
resource.
There are already two stubs which I feel we should build on and kyron
has compiled an excellent list of programs if you follow the links.
http://gentoo-wiki.com/Index:HOWTO#Build_a_Gentoo_High_Performance_Cluster
I suggest we start Build a Gentoo High Availability Cluster.
http://www.gentoo.org/proj/en/cluster/
This is the gentoo cluster page and we only have three How To's All of
which have floors I've kept tabs on problems I've run into with the
HPC howto and distcc howto. I feel we should keep openMosix separate
and have a completely separate set of Howto's for that.
My clusters are generally diskless nodes so I suggest we try to
incorporate this howto into the gentoo-wiki
http://www.gentoo.org/doc/en/diskless-howto.xml
<http://www.gentoo.org/doc/en/diskless-howto.xml>
Though this also has it's fair share of difficulties.
I'm prepared to share a certain amount of my work on this. It would be
nice to make this documentation easily understandable for all and I'm
always up for people adding where they run into problems and WHY into
these sort of documents.
I'm looking carefully at HA diskless nodes and ways in which to ensure
redundancy if the master node fails. Suggestions on this would be
welcomed.
How many people would be interested in helping out with this. If
you've read this far it must be because it's a Friday afternoon so
anything can distract you!
Cheers
Hanni
--
Justin Bronder
University of Maine, Orono
Advanced Computing Research Lab
20 Godfrey Dr
Orono, ME 04473
www.clusters.umaine.edu
Mathematics Department
425 Neville Hall
Orono, ME 04469
--
gentoo-cluster@gentoo.org mailing list