On Mon, 12 Apr 1999 [EMAIL PROTECTED] wrote:

> 
> > Hmmmm, it might be worth noting that linux has never worked on
> > "everything" and probably never will -- the hardware side moves, the
> > software side moves, everything is dynamic and there are always bugs and
> > broken things.  So your observation that there are occasional problems
> > with 2.0.36 (or any other revision) SMP isn't really an adequate excuse
> > for not supporting it even on a "at your own risk" basis with RPM's.
> > After all, 2.0.36 UP has occasional problems on at least some hardware
> > but it doesn't stop RH from selling Linux with 2.0.36 UP, and 2.0.x SMP
> > has run on "most" hardware combos for years now. I find the conservative
> > turn RH is taking to be a bit disturbing.
> 
> First, there's a huge difference, by percentage, of the amount of UP boxes
> that don't work vs the amount of SMP boxes that don't work.  

Ah, an observation relevant to the linux-smp list!  Let's see:

Do you have some hard figures on that?  According to the linux-smp FAQ
(and from years' membership of this list) I would expect that the "huge
difference" is something like 98% (SMP) to 99.5% (UP) (or even higher in
both with even smaller margins) with the SMP problems concentrated in a
few, very specific, motherboards and hardware combinations.  I'd bow to
any authoritative, non-anecdotal evidence, though.

> Second, this isn't a "RH" conservative turn at all.  RH has *never* provided
> or supported an SMP kernel, EVER.  There is no turn.  This is the way it
> always has been.  So, this issue lies with whether Dell wants to go out on
> a limb and do something custom. 

Again, having done SMP almost exclusively since shortly after 2.0.0 was
released with nearly flawless operation even when the 2.0.x kernel
really did have deadlock problems, I don't view this as "going out on a
limb".  

However, I do have some very current, very relevant data on this that I
can contribute to the list.  Recall that the original discussion
addressed Dell PowerEdge systems being sold with RH linux installed,
multiple processors, and a UP kernel.  Just what is the "risk"
associated with running them 2.0.36 SMP?

As it happens, we have 16 Dell PowerEdge 2300's (all dual 400MHz PII's)
donated to the University by an Intel equipment grant.  When we first
got them last July, their onboard AIC 7890 U2W controller wasn't
supported.  I worked with Doug Ledford (as best I could, mostly by
whining -- he did all the hard part:-) to get the driver working and at
5.1.something basically he succeeded.  After running the systems for
some months (using 2.0.33 as a base) with the new drivers with no
problems, I upgraded them to 2.0.36.  Straight compile and install,
aic7xxx and eepro100.

Of these 16 systems, four are on desktops and one of those has had
chronic hardware (NOT kernel) difficulties.  The desktop systems tend to
get rebooted, not because of kernel lockups or problems but because of
maintenance, an ongoing and annoying amd problem that forced a reboot of
most of our desktops a month or two ago after a change in our base
networking, or user ignorance.  Still they've been up an average of more
than sixty five days; not too shabby, given that the hardware-afflicted
one has only been up four (the average of the other three is well over
eighty days).

The 12 systems that form the core of our beowulf, all running stock,
unpatched 2.0.36 have been kept under continual load of 1-1.5 per CPU
(2-3 overall) with moderate continuous and occasionally heavy/bursty
network traffic.  Their uptimes range from let's see, 105 days to 111
days.  Now mind you, these systems aren't running Red Hat.  They are
actually running a melange of old Slackware, a few upgrades of e.g.
modutils and the like, libc 5.4.44 (I know, we're out of touch;-) and an
absolutely stock, unpatched 2.0.36.  Then there are the other twelve or
thirteen SMP systems in our department net on desktops -- mostly PPro's
and PII's -- with a fair range of hardware onboard and with uptimes that
range from 5 days to almost 200.  I think that the record in our
department is a system that has to be running 2.0.33 or maybe even
2.0.32 and has been up for almost a year.  They way it's going, it will
stay up until we "upgrade" to Red Hat this spring.

Now, two questions:  

First, are you implying that Red Hat is LESS stable with 2.0.x linux SMP
than this hard data clearly suggests (that is, should we reconsider
converting to Red Hat as our base distribution)?  I certainly hope not.
;-)

Second, >>where is the limb<< that Dell is supposedly going out on?  A
hundred days of uptime isn't the whole story, the story isn't finished
-- there have been NO CRASHES WHATSOEVER on the twelve workhorse
compute-server systems.

At the moment, I have no reason to believe that these systems wouldn't
be perking right along two years from now, never rebooted, if the
vagaries of the Power Company and a need to move on with life and
upgrade the installation didn't mandate a reboot in the meantime.  If we
ever got amd to properly stabilize, I think that I could extend that
observation to the entire department, SMP and UP systems alike.

As far as I'm concerned, on all but a tiny fraction of SMP systems
2.0.36 is awesomely, boringly STABLE.  Alan Cox, et. al., have done a
fabulous job -- they're down to patching bugs that affect only a
miniscule fraction of all SMP systems built for a few users a wee bit of
the time; at that, usually ones (too old or too new) with some feature
or another that departs from the approximate "standard".

Now, we're still looking forward to installing 2.2.x (right after the RH
upgrade).  Reports on this list (and my own still growing base of
experience with it) have it both technically superior in many ways and
amazingly stable for its revision number (although I would expect LESS
stable than 2.0.36, making it surprising that RH plans to support SMP
only under 2.2.x if stability is really your primary concern).  

I certainly hope that we aren't "going out on a limb" by expecting that
Red Hat will support SMP operation, with 2.0.x or 2.2.x, any LESS
flawless than we are already observing with our existing hodge-podge
linux installation and 2.0.36.  I also hope that we can look forward to
at least modest support from Red Hat in the event that we encounter
difficulties -- we are buying RH CD's fairly regularly now and are
mostly self-supporting via membership in the various relevant linux
lists anyway.

> > Still, corporate policy is corporate policy, so let's accept it for the
> > moment.  The real question, Doug, is why doesn't RH provide "make
> > install"-ready kernel sources in /usr/src/linux, installed by default,
> > in the 5.2 installation?  To quote from the GPL:
> 
> We do provide all sources, but we do *not* install them all by default.
> That would be dumb.  We've never done that and we likely never will.
> *Most* people don't need to recompile their kernel and thus don't need
> the kernel sources.  But, there is a kernel-source binary RPM (as Doug

Our experience here obviously differs.  However, I'd be happy to poll
DULUG, our Duke Linux User's Group and get some hard data on:

  a) Whether RH users have, on the average, needed to rebuild a kernel
and
  b) Whether RH users (including ones that haven't needed to rebuild a
kernel) would consider it "dumb" to include the make ready kernel
sources in the standard install, at least as a button/question mediated
option.  I mean, it >>is<< trivial to make it a question in the standard
install and let the user decide, isn't it?  Freedom of choice and all
that?

In fact, given that DULUG is "right next door" to Red Hat, it might be
reasonable for Red Hat to consider adopting Duke students as a
prototyping "fishbowl".  It's a big enough population to be
statistically significant, but small enough to be controllable, the
members are, if anything, brighter than average and highly motivated.
At the moment, I and a few other DULUG members with similar experience
provide most of the real support to these students.  I cannot help but
believe that Red Hat could improve their product and support by working
with these students, analyzing their problems and complaints, and
working out scalable solutions that prevent the problems from occurring
or minimize the difficulty of fixing them when they do.

As an indirect support person for Red Hat (and hardly qualified for the
role except in the most general linux/unix terms, although now that I
run 5.2 at home my RH-specific skills are rapidly improving) I must say
that my opinion of Red Hat hasn't changed too much over the last few
years.  

It is one of the most Windows-like Linux distributions, with a full
GUI-driven install.  When it (the install, kernel, and underlying device
drivers) works, it tends to work flawlessly and invisibly.  When it
doesn't work, it is an absolutely nightmare to make it work, especially
remotely by telling somebody what to do.  It works maybe 90% of the
time.  The other 10% Red Hat is never able, or willing, to help via a
support request, and the problem is either resolved by somebody like me
looking over the entire installation and figuring out how to get a given
driver to work or making whatever custom changes to the init process
that are required to get the system functional.  About half get fixed
that way, and half simply throw up their hands in disgust and try Debian
(if they want packages to be transparently supported) or Slackware (if
they want to learn hands-on unix management).

Now, I personally think that Red Hat would benefit tremendously from
working through the problems experienced by those 10%, at least in a
controlled environment (perhaps they cannot afford to do so for the
entire world just yet -- don't know).  Even if they had only one, tiny
"universe" where they provided "guaranteed customer satsifaction", the
problems they solved there would tremendously improve satisfaction
everywhere else.  I do know that as I have worked through the problems
on behalf of Red Hat (actually, on behalf of the students involved, as I
want their linux experiences to be positive) I have found things in Red
Hat's setup that I like very much or find sane, and other things that
drive me crazy.  It might be useful for a Red Hat person to be driven
crazy by the latter instead to motivate a gradual change...

> You're just plain wrong.  We do include those sources and always have.
> We used to install them by default, but stopped I believe at the release
> of the 2.0 kernel because the kernel sources are 25M and most people 
> just didn't need them.  If we chose to do that then *you* would be 
> happy but we'd have tons of whiners on the redhat-list claiming we
> were "bloated" or some such nonsense.

As I said, make it an install-time choice.  With the smallest disks
currently being sold infinitely large as far as linux is concerned, 25MB
is utterly irrelevant to most folks and I think even those people
building a stripped down system (e.g. a beowulf) would welcome having
the kernel sources already in place on their development host(s).  But
suit yourself.

> > Of course, the main PRACTICAL reason to provide the sources in EXACTLY
> > the form used to build and install the RH kernel is that, given the
> > .config file and the mods to the Makefile required to make it install
> > according to the RH prescription (make all the modules, install in
> > /boot, create various symlinks and so forth) one could, if one wished,
> > go to the kernel source directory, edit the Makefile to uncomment the
> > SMP = 1 line, and type "make install".  It's not just novices that would
> > appreciate this -- I wouldn't mind it myself.  And of course it would
> > make supporting Dell's entire line of SMP servers straightforward, since
> > their hardware is known to work fine with the stock 2.0.36 kernel.
> 
> I've done exactly the above in the past.  

Then why didn't the RH person do just this when the original poster of
the Dell question contacted RH for help?  Why is "How do I make my RH
system run SMP" virtually a linux-smp FAQ?  Why is the immediate problem
encountered by most of these people that they've gotten clean/current
kernel sources and have a terrible time making them install according to
the Red Hat prescription?  Why is the number of people who have
privately contacted me to say "Go get 'em, Rob" continually increasing
as I am giving voice to some concerns that are manifestly shared by
quite a few individuals out there?  Obviously, the word isn't getting
out.  As the maker of "linux for the masses" you have to know that
nobody actually reads manuals -- the only thing many users know of Red
Hat is what they see during their install and learn from their
friends...

> > Isn't RH taking a terrible risk by NOT doing this that both Dell and
> > innocent users who buy high end Dell 2300 and 6300 systems with the not
> > unreasonable expectation of being able to run a supported SMP linux on
> > the systems will get irritated enough to bag linux altogether and RH in
> > particular?  You've already heard from at least ONE user who bought from
> > Dell with precisely this expectation and was astounded to learn that
> > after all the media hooraw over RH linux on Dell servers, only one
> > processor worked of his many and neither Dell nor RH was prepared to
> > help him get the rest going!
> 
> You're arguing too many issues here.  The question is whether or not
> 2.0 SMP is supportable or not.  We don't really think it is, especially
> given how close we are to 2.2 SMP shipping.

<Stunned :-o>  

You don't think 2.0.>>36<< is supportable, although it is so stable it
is basically moribund and abandoned, but you think that 2.2.(x \approx
5) (a brand new "stable" release) is, in spite of the fact that your
existing 5.2 release runs 2.0.36 flawlessly in nearly all SMP boxes in
existence (including the Dells under primary discussion, never forget!)
while installing 2.2.x requires upgrading a half dozen key systems
components irreversibly, some of which are >>still<< broken for 2.2 in
the most recent "experimental" 2.2.x RPM's?  Brother, we have very
different definitions of the word supportable and stable...

I'm >>glad<< RH is going to support 2.2.x and, at last, SMP.  I think
that it is >>silly<< for Red Hat to not help Dell put a functional SMP
version of 5.2 on their high end SMP servers in the meantime.  I think
the data (NOT opinion, >>data<<) above makes it very clear that any
argument about stability or supportability is specious.  Most Dell SMP
server purchasers would probably consider well over 1000 days continous
uptime in aggregate acceptable.  On the other hand, permitting Dell to
distribute RH 5.2 with UP kernels on SMP systems makes Dell look stupid
and RH look bad.  Or is it the other way around?

Perhaps Red Hat doesn't value their business relationship with Dell.
Perhaps Red Hat doesn't care about, umm, "irritating" the customers who
buy SMP Dell systems to run their ISP or whatever only to find that it
is pre-installed with a UP kernel, that no instructions are provided for
actually using the other processors, and that when they call RH for help
the service people get snooty and say "We don't do SMP".  Perhaps Red
Hat doesn't care about its customers?  I hope not...

Obviously you guys are very well defended on this issue, but in cold,
hard business terms, it would cost Red Hat on the order of ten
man-hours, max, to put together a "custom" 2.0.36 SMP version of the 5.2
CD for Dell's private use.  It would cost you no more time than that to
add a 2.0.36-0.7smp RPM to the CD (and add an "install kernel source?"
and a "beware, try-at-your-own-risk-and-join-linux-smp-for-help install
smp kernel" question to the original install).  It would cost even less
than this to provide a drop-in smp solution (a downloadable RPM) for
those persons who call requesting it, and I'll bet such a thing already
exists anyway (which would reduce the cost still further, but your
service people still need to be directed to use it).  I personally think
that the expected profits from such an investment, in customer
satisfaction and goodwill translated into increased sales on Dell SMP
platforms alone (not to mention the gazillion other SMP server platforms
assembed by folks all over the country), are likely worth it.  But I'm
not a Red Hat manager and RH clearly has problems providing adequate
support as it is, so you could be right not to do it...

> > I know for a fact that Dell can manage an SMP NT install on those
> > particular beasts...however poor it may be.
> 
> So?

Interesting response for somebody selling a product in competition with
NT on multiprocessing servers.  Perhaps Red Hat >>doesn't<< care about
their business relationship with Dell or the multiprocessing server
market...who'd have thought it?  How is Linus going to achieve world
domination if linux users buying the "premier" linux distribution get so
frustrated that they are forced to return to NT?  >>NEVER<< give a
corporate manager who's taken the "risk" of trying linux at all a
>>good<< excuse to go back to NT or you'll never win them back.  It will
be "Been there, done that, no way" forever whenever somebody suggests
reconverting to linux in the future...

   rgb

Robert G. Brown                        http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:[EMAIL PROTECTED]



-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to