Just for the record, I've been up and running for 6.5 days now,
which never before happened under SMP. All I did was disable
SMP (running only one of the two processors, but since load so
is incredibly low, it really doesn't matter to me) and upgrade
to 2.2.12 (the RAID people told me raid0145-19990824 patched fine
against 2.2.12, so that's why I'm at 2.2.12 now).
I don't feel the 2.2.12 upgrade is what solved the problem, so
all I can seem to blame the hardlockups on is the SMP support in
the kernel. One reply mentioned double-checking to make sure
the stepping of the two processors was identical; it was. I'm
not sure what else it could be. The machine, I reiterate, ran
perfectly stable under SCO UNIX, so I don't feel it is the hardware.
The main goal here was to get the system stable. I feel it is. But
if anyone has any ideas as to why SMP would cause the hardlockups,
or if there is anything I can try to help out the community and track
down what would cause this problem, be sure to let me know.
Thanks for everyone's help.
::: Jason A. Diegmueller
::: Microsoft Certified Systems Engineer
::: 513/542-1500 WORK // [EMAIL PROTECTED]
::: Systems Administrator, Bertke Systems Innovations
: -----Original Message-----
: From: Jason A. Diegmueller
: Sent: Wednesday, September 15, 1999 1:08 PM
: To: 'Heinz Christian'; 'Tom Kunz'; 'David Holl'; 'Michael
: Sloan'; 'Mike
: Black'; 'Jean-Francois Patenaude'; Jay Klute
: Cc: [EMAIL PROTECTED]; '[EMAIL PROTECTED]'
: Subject: RE: Linux box locking up ..
:
:
: In response to the HP Netserver LXe Pro lockup issue I had before,
: and with the responses I've gotten so far, here is my current
: plan of attack. Any further input or suggestions are as always
: welcome:
:
: 1. I'm compiling without SMP support as we speak. The most this
: thing has made it is 3 full days without locking up solid,
: so if it runs for a week I'll blame it on 2.2.11 SMP, I guess.
: NOTE: I can't go newer then 2.2.11 at this time due to the fact
: the latest released raid0145 patch is for 2.2.11. RAID
: people, I haven't tried it yet: Will it patch 2.2.12 without
: too much hassle?
: 2. If still getting lockups without SMP, I'm going to try pulling
: the Intel EEPro and let a 3c905B have a whack at it.
: 3. I just thought of this, but the 78xx firmware is probably still
: the original from when the machine was purchased in 1996. I
: guess I could broach the firmware-update route. I feel stupid
: for not having done this yet.
: 4. If still no luck, I could possibly wheel a different SCSI
: controller out there. Admittedly, I've never used anything but
: IDE or 2940UWs in Linux boxes; what other cards are fairly well-
: supported and reliable? I'll try disabling onboard and going
: with a different controller if we get all the way down here to
: the fourth point. [grin]
:
: Thank you for all who have assisted so far, and in the future.
:
: ::: Jason A. Diegmueller
: ::: Microsoft Certified Systems Engineer
: ::: 513/542-1500 WORK // [EMAIL PROTECTED]
: ::: Systems Administrator, Bertke Systems Innovations
:
: : -----Original Message-----
: : From: Mike Black [mailto:[EMAIL PROTECTED]]
: : Sent: Wednesday, September 15, 1999 10:58 AM
: : To: Jason A. Diegmueller
: : Cc: [EMAIL PROTECTED]
: : Subject: Re: Linux box locking up ..
: :
: :
: : I've got similar setup with Dual PIII/450, AIC7880 on-board,
: : 2940U2W, IDE
: : root drive and it locks up me real quickly when doing heavy
: : disk i/o on the
: : RAID set. I'm isolating this system now to hopefully debug
: : this problem.
: :
: : ________________________________________
: : Michael D. Black Principal Engineer
: : [EMAIL PROTECTED] 407-676-2923,x203
: : http://www.csi.cc Computer Science Innovations
: : http://www.csi.cc/~mike My home page
: : FAX 407-676-2355
: : ----- Original Message -----
: : From: Jason A. Diegmueller <[EMAIL PROTECTED]>
: : To: <unlisted-recipients:; (no To-header on input)>
: : Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
: : Sent: Wednesday, September 15, 1999 10:43 AM
: : Subject: Linux box locking up ..
: :
: :
: : I must say I've never seen this in my entire life, so
: : I wanted to get some input. This is sent to both the
: : linux-admin and linux-raid lists.
: :
: : I have a customer/friend (yes, the two-in-one combo that
: : is often noted for being dangerous =) who recently upgraded
: : an old SCO setup they had to SCO Openserver 5.0.5. This
: : included selling him new hardware and the works. This meant
: : that his old HP Netserver LXe Pro (which is a gorgeous machine)
: : became a spare server for us to utilize.
: :
: : Being the avid Linux geek I am, I immediately dumped Linux
: : on there. The Mylex DAC960 was not supported (the card
: : is the old 2.x firmware variety, and HP wanted money to upgrade
: : us to the 3.x series) so I could not utilize the hardware
: : RAID. Instead, I went with software.
: :
: : It is a dual-processored machine (capable of 4, only utilizing
: : two PPro 200's at this time, 512k cache each), so I have
: : SMP compiled it. So at the current time, it basically is:
: : linux-2.2.11-SMP with raid0145-990824 with four 2gb drives
: : in a RAID-5. The SCSI bus is onboard Adaptec 78xx. The
: : network card is an Intel Etherexpress PRO.
: :
: : The problem? It locks up. Solid. I've never in my life
: : seen a Linux box just lock up, with no hints anywhere in
: : logfiles. On the other hand, I've never gotten my hands on
: : hardware this "big" (This sucker was $31k retail when they
: : bought it). The machine is currently virutally unused (other
: : then qpopper for POP mail); SAMBA is setup on it, but is
: : currently completely unutilized at this time.
: :
: : It seems to lock hard every few days. Maybe 3 or 4? I see
: : no coorelation of activity (ie, users doing something) and
: : lockups, but am willing to dig a little deeper if someone has
: : an idea.
: :
: : I was wondering two things:
: : A. Are there any known incompatibilities with any of this
: : hardware? I've seen some mentions of aix78xx, SMP, and
: : raid causing problem. Is this what I'm bumping in to?
: : B. Is there anything I can do to figure out WHAT is causing
: : the hard lockups? Again, no hints in /var/log/messages or
: : anywhere else. Possibly a serial cable to a dumb terminal
: : constantly dumping system information?
: :
: : Any information or clues would be more then appreciated. Replies
: : directly to the list are more then fine; I subscribe to both.
: :
: : Thanks.
: :
: : ::: Jason A. Diegmueller
: : ::: Microsoft Certified Systems Engineer
: : ::: 513/542-1500 WORK // [EMAIL PROTECTED]
: : ::: Systems Administrator, Bertke Systems Innovations
: :
: :
: : -====---====---====---====---====---====---====---====---====-
: : --====---====-
: : to unsubscribe email "unsubscribe linux-admin" to
: : [EMAIL PROTECTED]
: : See the linux-admin FAQ: http://www.kalug.lug.net/linux-admin-FAQ/
: :
: