Re: [Beowulf] Personal Introduction & First Beowulf Cluster Question

Gus Correa Mon, 08 Dec 2008 13:23:37 -0800

Hello Steve and list

Steve Herborn wrote:

The hardware suite is actually quite sweet, but has been mismanaged rather
badly.  It has been left in a machine room that is too hot & on power that
is more then flaky with no line conditioners.  One of the very first things

I had to do was replace almost two-dozen Power Supplies that were DOA.

Yes, 24 power supplies may cost as much as the savings in UPS,
plus the headache of replacing them, plus failing nodes.

I think I have most of the hardware issues squared away right now and need
to focus on getting here up & running, but even installing the OS on a
head-Node is proving to be troublesome.

Besides my naive encouragement to use Rocks,
I remember some recent discussions here on the Beowulf list
about different techniques to setup a cluster.
See this thread, and check the postings by
Bogdan Cotescu, from the University of Heidelberg.
He seems to administer a number of clusters, some of which have
constraints comparable to yours, and to use a variety of tools for this:

http://www.beowulf.org/archive/2008-October/023433.html
http://www.iwr.uni-heidelberg.de/services/equipment/parallel/

I really wish I could get away with using ROCKS as there would be such a
greater reach back for me over SUSE.  Right now I am exploring AutoYast to

push the OS out to the compute nodes,

Long ago I looked into System Imager, which was then part of Oscar,
but I don't know if it is current/maintained:

http://wiki.systemimager.org/index.php/Main_Page

but that is still going to leave me
short on any management tools.

That is true.
Tell bosses they are asking you to reinvent the Rocks wheel.

Good luck,
Gus Correa

--
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: [EMAIL PROTECTED]
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Steven A. Herborn
U.S. Naval Academy
Advanced Research Computing
410-293-6480 (Desk)
757-418-0505 (Cell)


-----Original Message-----
From: Gus Correa [mailto:[EMAIL PROTECTED]Sent: Monday, December 08, 2008 1:45 PM
To: Beowulf
Cc: Steve Herborn
Subject: Re: [Beowulf] Personal Introduction & First Beowulf Cluster
Question

Hello Steve and list
In the likely case that the original vendor will no longer support this5-year old cluster,
you can try installing the Rocks cluster suite, which is free from SDSC,
and you already came across to:

http://www.rocksclusters.org/wordpress/

This would be a path or least resistance, and may get your cluster up and
running again with relatively small effort.
Of course there are many other solutions, but they may require more effort
from the system administrator.

Rocks is well supported and documented.
It is based on CentOS (free version of RHEL).

There is no support for SLES on Rocks,
so if you must keep the current OS distribution, it won't work for you.
I read your last paragraph, but you may argue with your bosses that theage of this
machine doesn't justify being picky about the particular OS flavor.
Bringing it back to life, making it an useful asset,
with a free software stack, would be a great benefit.
You would spend money only in application software (e.g. Fortrancompiler, Matlab, etc).
Other solutions (e.g. Moab) will cost money, and may not work with
this old hardware.
Sticking to SLES may be a catch-22, a shot on the foot.

Rocks has a relatively large user base, and an active mailing list for help.

Moreover, for Rocks minimally you must have 1GB of RAM on every node,
two Ethernet ports on the head node, and one Ethernet port on eachcompute node.
Check the hardware you have.
Although PXE boot capability is not strictly required, it makesinstallation much easier.
Check your motherboard and BIOS.
I have a small cluster made of five salvaged Dell Precision 410 (dualPentium III)
running Rocks 4.3, and it works well.
For old hardware Rocks is a very good solution, requiring a modestinvestment of time,
and virtually no money.
(In my case I only had to buy cheap SOHO switches and Ethernet cables,
but you probably already have switches.)

If you are going to run parallel programs with MPI,
the cheapest thing would be to have GigE ports and switches.
I wouldn't invest on fancier interconnect on such an old machine.
(Do you have any fancier interconnect already, say Myrinet?)
However, you can buy cheap GigE NICs for $15-$20, and high end ones (sayIntel Pro 1000) for $30 or less.
This would be needed only if you don't have GigE ports on the nodes already.
Probably your motherboards have dual GigE ports, I don't know.
MPI over 100T Ethernet is a real pain, don't do it, unless you are amasochist.A 64-port GigE switch to support MPI traffic would also be a worthwhileinvestment.Keeping MPI on a separate network, distinct from the I/O and clustercontrol net, is a good thing.
It avoids contention and improves performance.
A natural precaution would be to backup all home directories before youstart,
and any precious data or filesystems.

I suggest sorting out the hardware issues before anything else.

It would be good to evaluate the status of your RAID,
and perhaps use that particular node as a separate storage appliance.
You can try just rebuilding the RAID, and see if it works, or perhapsreplace the defective disk(s),
if the RAID controller is still good.
Another thing to look at is how functional your Ethernet (or GigE)switch or switches are,and if you have more than one switch how they are/can be connected toeach other.(One for the whole cluster? Two or more separate? Some specific topologyconnecting many switches?)
I hope this helps,
Gus Correa


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Personal Introduction & First Beowulf Cluster Question

Reply via email to