Re: [Beowulf] Clusters and Distro Lifespans

Joe Landman Wed, 19 Jul 2006 10:23:50 -0700

Robert G. Brown wrote:

On Wed, 19 Jul 2006, Stu Midgley wrote:

We also have our install process configured to allow booting different
distros/images, which is useful to boot diagnostic cd images etc.


Good point and one I'd forgotten to mention.  It is really lovely to
keep a PXE boot image pointed at tools like memtest86, a freedos image
that can e.g. flash bios or do other stuff that expects an environment
that can execute a MS .exe, boot into a diskless config for repair
purposes (or to bring up a node diskless while waiting for a replacement

disk).


[...]

The tools we set up do all of this, and for those whom are brave (orfoolish, not sure which) we also have dban ... . Still working ongetting Knoppix to do this, I know its possible, haven't seen docs onhow to do it.

Honestly, for MOST work people do with clusters, running pretty much the
(PXE-installable) distro of your choice will almost certainly work.  I
tend to use FC-even or Centos (a.k.a. FC-even-frozen) on cluster nodes
simply because we have long since gotten to where we can make RH-derived
distributions jump through hoops.  With Seth Vidal in charge of the core
mirrors and repos, Duke is "Repo World" not just to campus but to much
of the world.  Heck, I PXE-boot and kickstart install my systems at
HOME using mirrors of the duke repos, and if I ever bothered to figure
out Icon's toolset for customizing kickstart boots per system (using
some very clever CGI scripts and a bit of XML) it would make those
installs even easier than they are now.

Sadly, not all distros do yum, nor do all distros have sensibledependency trees, nor even sane/common naming.

SuSE as of 10.0 can work with yum. We have/host a repo forourselves/customers. The problem is that yum is not a first classsystem tool on SuSE like rug or zmd or whatever. Which means that thereare things that break yum under SuSE that don't break runningYast/zmd/rug. Grrrrr. (If anyone from SuSE is reading, this was areally bad idea, go to yum, your life, my life, and your customers liveswill be *much* easier). Well there is that and yum on 10.1 is slightlyborked.

> iii) Do people regularly upgrade their clusters in relation to
> distros?  I guess this is like asking how long is a piece of string
> because everyone's needs are different.

Cluster upgrades are rare unless you are missing functionality or
something is broken.  That is of course one opinion, some here do
upgrades nightly.  From a purely production oriented viewpoint, where

downtime == lost money for our customers, we usually advise againstthat.


I think rare is a strong word.  Infrequent may be better.  We
regularly apply patches and upgrades to the front end nodes (globally
connected) and infrequently (~ every 6 months) upgrade all the cluster
nodes in the rolling fashon mentioned above.

I assume that rare == infrequent. Basically the argument for productioncycle shops are that you don't upgrade unless there is a need to. Thatis, stuff could/does break with upgrades, and you have to be reallycareful. Test test test. If you need a security patch, I am not sureany production cycle shop considers this an upgrade, but again, testtest test. The rules of thumb that I see followed are "if it ain'tbroke, don't fix it".

If you install new hardware, you likely need newer kernels and driversto deal with it (say like SATA and RHEL4 before U1).


You can even do a kernel upgrades to the file servers/front end nodes
(which requires a reboot) without killing or disrupting jobs.  Having
complete control has a lot of benefits.

It does, and you often need a fairly competent staff around to make thiswork. There are a shortage of Mark Hahn's in the world, so not everysite can work the stuff he does. Similarly for other sites.


[...]

On the whole, though, updates are there for a reason and STABILIZE
systems more often than the DESTABILIZE them.

The last Centos 4.3 x86_64 kernel update almost nuked one of our veryimportant servers. Had to back it out, and thankfully I had backups ofthe affected files. Updates are *supposed* to increase stability. Theydon't always do that. Remember that an update is brain surgery, if youtreat it anything less than that you are going to be burned someday.The folks advising caution are not advising it because they like to becautious, but because they have been burned before, and they don't wantto see others fall into the same behavior that burned them.

rgb




--

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Clusters and Distro Lifespans

Reply via email to