Robert G. Brown wrote:
On Wed, 19 Jul 2006, Stu Midgley wrote:

We also have our install process configured to allow booting different
distros/images, which is useful to boot diagnostic cd images etc.

Good point and one I'd forgotten to mention.  It is really lovely to
keep a PXE boot image pointed at tools like memtest86, a freedos image
that can e.g. flash bios or do other stuff that expects an environment
that can execute a MS .exe, boot into a diskless config for repair
purposes (or to bring up a node diskless while waiting for a replacement
disk).

[...]

The tools we set up do all of this, and for those whom are brave (or foolish, not sure which) we also have dban ... . Still working on getting Knoppix to do this, I know its possible, haven't seen docs on how to do it.

Honestly, for MOST work people do with clusters, running pretty much the
(PXE-installable) distro of your choice will almost certainly work.  I
tend to use FC-even or Centos (a.k.a. FC-even-frozen) on cluster nodes
simply because we have long since gotten to where we can make RH-derived
distributions jump through hoops.  With Seth Vidal in charge of the core
mirrors and repos, Duke is "Repo World" not just to campus but to much
of the world.  Heck, I PXE-boot and kickstart install my systems at
HOME using mirrors of the duke repos, and if I ever bothered to figure
out Icon's toolset for customizing kickstart boots per system (using
some very clever CGI scripts and a bit of XML) it would make those
installs even easier than they are now.

Sadly, not all distros do yum, nor do all distros have sensible dependency trees, nor even sane/common naming.

SuSE as of 10.0 can work with yum. We have/host a repo for ourselves/customers. The problem is that yum is not a first class system tool on SuSE like rug or zmd or whatever. Which means that there are things that break yum under SuSE that don't break running Yast/zmd/rug. Grrrrr. (If anyone from SuSE is reading, this was a really bad idea, go to yum, your life, my life, and your customers lives will be *much* easier). Well there is that and yum on 10.1 is slightly borked.

> iii) Do people regularly upgrade their clusters in relation to
> distros?  I guess this is like asking how long is a piece of string
> because everyone's needs are different.

Cluster upgrades are rare unless you are missing functionality or
something is broken.  That is of course one opinion, some here do
upgrades nightly.  From a purely production oriented viewpoint, where
downtime == lost money for our customers, we usually advise against that.

I think rare is a strong word.  Infrequent may be better.  We
regularly apply patches and upgrades to the front end nodes (globally
connected) and infrequently (~ every 6 months) upgrade all the cluster
nodes in the rolling fashon mentioned above.

I assume that rare == infrequent. Basically the argument for production cycle shops are that you don't upgrade unless there is a need to. That is, stuff could/does break with upgrades, and you have to be really careful. Test test test. If you need a security patch, I am not sure any production cycle shop considers this an upgrade, but again, test test test. The rules of thumb that I see followed are "if it ain't broke, don't fix it".

If you install new hardware, you likely need newer kernels and drivers to deal with it (say like SATA and RHEL4 before U1).


You can even do a kernel upgrades to the file servers/front end nodes
(which requires a reboot) without killing or disrupting jobs.  Having
complete control has a lot of benefits.

It does, and you often need a fairly competent staff around to make this work. There are a shortage of Mark Hahn's in the world, so not every site can work the stuff he does. Similarly for other sites.

[...]

On the whole, though, updates are there for a reason and STABILIZE
systems more often than the DESTABILIZE them.

The last Centos 4.3 x86_64 kernel update almost nuked one of our very important servers. Had to back it out, and thankfully I had backups of the affected files. Updates are *supposed* to increase stability. They don't always do that. Remember that an update is brain surgery, if you treat it anything less than that you are going to be burned someday. The folks advising caution are not advising it because they like to be cautious, but because they have been burned before, and they don't want to see others fall into the same behavior that burned them.

   rgb



--

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to