Greetings,

I'm currently employed at a site with some Xserve G5's and a smattering of PIII's. I cannot comment on High Availability Clusters, but I'll be more then willing to
discuss the HPC side of clusters.

Right now we primarily run OS X on the G5's, however work is in progress to allow job-submission time switching between OS X and Linux (Debian or Gentoo currently,
others in the future possibly) based upon user-submitted requests.

As we run a variety of operating systems, I personally prefer to compile the HPC- orientated applications from source. Anyways, I noticed a request for software recommendations earlier in this thread, so here's a list of the first things I
end up installing when we build a test/development cluster, along with
the versions I have running.

Torque (2.0.0p5)
Mpich (1.2.7)
Mpichgm (Myrinet support, based on 1.2.6 )
Mpiexec (0.80)
Atlas (3.7.11)
HPL (To test the install mainly)

We also find it nice to have server(s) providing:
LDAP
DHCP and related netbooting services. (We've written our own, highly alpha stage right now). NFS for home directories only. We've found numerous scalability problems with diskless.

Of course the shameless plug for our MyPBS package is also required,
http://sourceforge.net/projects/my-pbs/

This is just a quick list of of what I think any documentation on a HPC cluster needs to cover at minimum. I'm by no means an expert, but I would like to offer my help.

Hanni Ali wrote:

Ok,

I suggest we try to put the documentation together on gentoo-wiki.com <http://gentoo-wiki.com> I've always found this site an excellent resource.

There are already two stubs which I feel we should build on and kyron has compiled an excellent list of programs if you follow the links.

http://gentoo-wiki.com/Index:HOWTO#Build_a_Gentoo_High_Performance_Cluster

I suggest we start Build a Gentoo High Availability Cluster.

http://www.gentoo.org/proj/en/cluster/

This is the gentoo cluster page and we only have three How To's All of which have floors I've kept tabs on problems I've run into with the HPC howto and distcc howto. I feel we should keep openMosix separate and have a completely separate set of Howto's for that.

My clusters are generally diskless nodes so I suggest we try to incorporate this howto into the gentoo-wiki

http://www.gentoo.org/doc/en/diskless-howto.xml <http://www.gentoo.org/doc/en/diskless-howto.xml>

Though this also has it's fair share of difficulties.

I'm prepared to share a certain amount of my work on this. It would be nice to make this documentation easily understandable for all and I'm always up for people adding where they run into problems and WHY into these sort of documents.

I'm looking carefully at HA diskless nodes and ways in which to ensure redundancy if the master node fails. Suggestions on this would be welcomed.

How many people would be interested in helping out with this. If you've read this far it must be because it's a Friday afternoon so anything can distract you!

Cheers

Hanni



--
Justin Bronder
University of Maine, Orono

Advanced Computing Research Lab
20 Godfrey Dr
Orono, ME 04473
www.clusters.umaine.edu

Mathematics Department
425 Neville Hall
Orono, ME 04469


--
gentoo-cluster@gentoo.org mailing list

Reply via email to