Robert G. Brown wrote:
On Wed, 19 Jul 2006, Stu Midgley wrote:
We also have our install process configured to allow booting different
distros/images, which is useful to boot diagnostic cd images etc.
Good point and one I'd forgotten to mention. It is really lovely to
keep a PXE boot image pointed at tools like memtest86, a freedos image
that can e.g. flash bios or do other stuff that expects an environment
that can execute a MS .exe, boot into a diskless config for repair
purposes (or to bring up a node diskless while waiting for a replacement
disk).
[...]
The tools we set up do all of this, and for those whom are brave (or
foolish, not sure which) we also have dban ... . Still working on
getting Knoppix to do this, I know its possible, haven't seen docs on
how to do it.
Honestly, for MOST work people do with clusters, running pretty much the
(PXE-installable) distro of your choice will almost certainly work. I
tend to use FC-even or Centos (a.k.a. FC-even-frozen) on cluster nodes
simply because we have long since gotten to where we can make RH-derived
distributions jump through hoops. With Seth Vidal in charge of the core
mirrors and repos, Duke is "Repo World" not just to campus but to much
of the world. Heck, I PXE-boot and kickstart install my systems at
HOME using mirrors of the duke repos, and if I ever bothered to figure
out Icon's toolset for customizing kickstart boots per system (using
some very clever CGI scripts and a bit of XML) it would make those
installs even easier than they are now.
Sadly, not all distros do yum, nor do all distros have sensible
dependency trees, nor even sane/common naming.
SuSE as of 10.0 can work with yum. We have/host a repo for
ourselves/customers. The problem is that yum is not a first class
system tool on SuSE like rug or zmd or whatever. Which means that there
are things that break yum under SuSE that don't break running
Yast/zmd/rug. Grrrrr. (If anyone from SuSE is reading, this was a
really bad idea, go to yum, your life, my life, and your customers lives
will be *much* easier). Well there is that and yum on 10.1 is slightly
borked.
> iii) Do people regularly upgrade their clusters in relation to
> distros? I guess this is like asking how long is a piece of string
> because everyone's needs are different.
Cluster upgrades are rare unless you are missing functionality or
something is broken. That is of course one opinion, some here do
upgrades nightly. From a purely production oriented viewpoint, where
downtime == lost money for our customers, we usually advise against
that.
I think rare is a strong word. Infrequent may be better. We
regularly apply patches and upgrades to the front end nodes (globally
connected) and infrequently (~ every 6 months) upgrade all the cluster
nodes in the rolling fashon mentioned above.
I assume that rare == infrequent. Basically the argument for production
cycle shops are that you don't upgrade unless there is a need to. That
is, stuff could/does break with upgrades, and you have to be really
careful. Test test test. If you need a security patch, I am not sure
any production cycle shop considers this an upgrade, but again, test
test test. The rules of thumb that I see followed are "if it ain't
broke, don't fix it".
If you install new hardware, you likely need newer kernels and drivers
to deal with it (say like SATA and RHEL4 before U1).
You can even do a kernel upgrades to the file servers/front end nodes
(which requires a reboot) without killing or disrupting jobs. Having
complete control has a lot of benefits.
It does, and you often need a fairly competent staff around to make this
work. There are a shortage of Mark Hahn's in the world, so not every
site can work the stuff he does. Similarly for other sites.
[...]
On the whole, though, updates are there for a reason and STABILIZE
systems more often than the DESTABILIZE them.
The last Centos 4.3 x86_64 kernel update almost nuked one of our very
important servers. Had to back it out, and thankfully I had backups of
the affected files. Updates are *supposed* to increase stability. They
don't always do that. Remember that an update is brain surgery, if you
treat it anything less than that you are going to be burned someday.
The folks advising caution are not advising it because they like to be
cautious, but because they have been burned before, and they don't want
to see others fall into the same behavior that burned them.
rgb
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf