On Wed, 4 Nov 1998, Mark Hahn wrote:

> > Believe it or not, there are a few (ahem) folks running SMP systems in
> > production, and the 2.0 kernels may not be the cat's meow, but they
> 
> any 2.0 SMP box that is not strictly cpu-bound is basically crippled.

Sure, but many, many SMP boxes are purchased to do calculations.
Calculations are generally strictly CPU bound.  Even coarse-to-medium
grained network parallel beowulfish tasks are as likely to be CPU bound
as not, and won't benefit significantly from finer grained kernel locks
anyway because there aren't NON-network kernel tasks in abundance to run
in parallel.

> > Besides it's hard to brag about months of uptime if you're rebooting to
> > install new kernels.  :-)
> 
> this doesn't imply that you must run each one.  can anyone produce evidence 
> that no 2.1 is as stable as 2.0?  2.1 is the product of over 2 years of
> concerted effort, most of which will never benefit anyone running 2.0.
> 
> remember, 2.0 is basically bug-fixes since June 6, 1996.

I think that the point is that it is a moderate chore to upgrade a
stable, functioning, 2.0.X operation into a stable, functioning 2.1.X
operation.  None of the main distributions (that I know of, anyway) come
with 2.1.X as the main kernel, for the excellent reason that they need a
kernel that runs on nearly any platform/hardware combination and will be
pestered to death by owners where it doesn't work.  From sitting on this
list for years now, it is perfectly apparent that there are many
hardware combinations where any given 2.1.X doesn't work, sometimes for
no obvious reason.  Then there are the hardware drivers, not all of
which are as uniformly supported in 2.1.X as in 2.0.X (just from
glancing over the hardware parts of the kernel config process).  This is
a moving target and I'm sure it has improved since I last looked, but
again, Red Hat needs to have "universal" hardware support in a
shrinkwrap linux distribution.

To upgrade by hand, one has to go down a list of auxiliary support
(genksysm,libc, etc.) and upgrade all of it to an "approved" revision.
Not particularly difficult and it all works with 2.0.x, so one can do
this without destabilizing but it is some work and may -- in the case of
e.g. nfs support -- cause some headaches.  It has me in the past,
anyway.  So you get all this done, and then YOU face the same problem as
the Red Hat people, scaled to your particular part of the microcosm.  If
your operation consists of only one system, or if the hardware in your
operation is largely homogeneous, again there is no particular problem.
You cut a 2.1.x kernel, configured for your hardware, install it, and
boot.  Then comes the debugging part -- maybe it fails to boot, maybe it
boots but comes up broken, maybe it boots and comes up but crashes after
a bit when you stress test it, and maybe it works perfectly.  Again, on
this list we've seen this process recapitulated over and over again (and
yes, it can even happen following a 2.0.[x -> x+n] upgrade, although
this is usually less strenuous an exercise).

If your system is INhomogenous, with lots of different hardware
combinations, you may well go through parts of this exercise many times.
On some of them you may have immediate success; on others you may have
NO success and have to turn to the list to get help or even hit a real
bug in the particular X you use.  You then have to back off to X-n or
wait for X+n to try to get a single kernel you can use on all your
hardware.

I would estimate that this job for our local operation would take
minimally a week.  The two times I have followed the exercise up to
running a trial kernel on a single system it took me around three days
to finally get something that appeared stable, and then my measurements
of the network subsystem showed that single threaded, unchallenged
netperf performance was WORSE than with 2.0.X kernels.  I believe that
this has probably been fixed in the meantime, but the thought of
wrestling the system for a day or two (assuming I don't have to upgrade
all the auxiliary support software yet again) to find out is still
daunting.  Then there are the other ten or fifteen hardware combinations
I'd have to debug to implement it -- another few days work, IF it works.

I do believe you when you say that 2.1.x is better, and I even
understand why.  Don't underestimate the very real cost, though, of
upgrading a network to run 2.1.x everywhere.  At the moment, this is
hard to justify in terms of cost (human) vs benefit (improved handling
of interrupt locks and IO-APIC support).  I don't think it would make
two CPU-minute's difference in productivity on the coarse grained CPU
bound tasks that predominate our network, and I'm not worried about
increasing interactive responses by another skillionth of a second for
GUI users.

One day I will bite the bullet and convert everything over -- 2.0.36
sounds like (if it does, at last, fix all the 2.0.x bugs) it may be the
last 2.0.x kernel and although its awesome stability (ever since 2.0.33
it has been awesomely stable SMP) is tempting it will eventually become
a trap.  If and when RH and Slackware and Debian offer ready-to-run
2.1.x (or more likely 2.2.x) support will be a good upper bound, but
probably before that.  I think it is easy to understand, though, why a
reasonable systems person might choose to continue running 2.0.x and opt
for stability and immediate productivity rather than endure the work
burden and destabilization attendant to a full kernel upgrade.  It is a
serious undertaking even for an expert/professional.

    rgb

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]


Reply via email to