I've had a bear of a time trying to get my brand new pIIIdme
stable. I've got a pair of 667 MHz PIII coppermines + 256 MB PC133
RAM on board. I've disabled as much of the APM stuff as I could find a
way to turn off and I've disabled the onboard sound.
I originally tried to replace my old Tyan SMP board with this one but
it wasn't stable so I built up a stripped system with nothing but an
AGP Voodoo3, one IDE HD, an IDE CDROM, and a floppy disk to minimize
the possible problems. Oh yes, and a ps/2 mouse (Logitech). I
installed the current version of Slackware on it (slackware-current as
of May 28, 2000).
I've had no luck with 2.2.14, 2.2.15, 2.2.16pre5, MP and SP, with and
without 'noapic'. By no luck, I mean apparently random lock ups.
2.4.0-test1 still locks up, but not as often. 2.4.0-test1-ac4 seems to
do the best but I'm still having problems when I do a lot of compiling
(e.g. kernel + openssl in another login). 'noapic' with ac4 seemed worse.
On boot, I still get the IO-APIC warning...
May 29 03:23:53 thor kernel: IO APIC #2......
May 29 03:23:53 thor kernel: .... register #00: 02000000
May 29 03:23:53 thor kernel: ....... : physical APIC id: 02
May 29 03:23:53 thor kernel: .... register #01: 00170020
May 29 03:23:53 thor kernel: ....... : max redirection entries: 0017
May 29 03:23:53 thor kernel: ....... : IO APIC version: 0020
May 29 03:23:53 thor kernel: WARNING: unexpected IO-APIC, please mail
May 29 03:23:53 thor kernel: to [EMAIL PROTECTED]
May 29 03:23:53 thor kernel: .... register #02: 00000000
May 29 03:23:53 thor kernel: ....... : arbitration: 00
May 29 03:23:53 thor kernel: .... IRQ redirection table:
which repeats for IO APIC #3. I'll be glad to send the full message on
request but it seems to have hit the list a number of times.
The ac4 seemed pretty stable so I decided to push it by compiling both
the kernel and openssl simultanously and repeatedly. I got a lot of
signal 11 and an error 33. Eventually, I got a message in the log...
May 29 02:38:36 thor kernel: Unable to handle kernel paging request at virtual address
00daa440
May 29 02:38:36 thor kernel: *pde = 00000000
which also turned up a console message...
root@thor# Unable to handle kernel paging request at virtual address 00daa440
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c011f4e6>]
EFLAGS: 00010246
eax: cd99a100 ebx: cbf8c000 ecx: bffe4000 edx: 00000000
esi: 00daa440 edi: 0000000b ebp: cbf8dfbc esp: cbf8df08
ds: 0018 es: 0018 ss: 0018
Process cc1 (pid: 28894, stackpage=cbf8d000)
Stack: cbf8c000 0000000b cbf8c000 c010b9c0 0000000b cbf8c000 086b1638 00000da7
0000000b ce78998c cbf8df3c cbf8c648 cbf8dfc4 0000000b 00000000 00000000
00000000 00000000 00000077 ce841c80 ce37dac0 00000004 40000000 cc281b00
Call Trace: [<c010b9c0>] [<c012784d>] [<c0126967>] [<c010bc7d>] [<c010bba4>]
Code: f0 ff 0e 0f 94 c0 84 c0 0f 84 a9 00 00 00 8b 46 08 50 e8 2f
So, I kept on compiling and on one broken compile I got a console
message...
root@thor# memory.c:83: bad pmd 00ff0000.
memory.c:83: bad pmd 00996000.
Of course that didn't stop me so I kept going and finally the system
locked up. I've copied, by hand, as much of the console message as I
could see since it appeared to have scrolled off the top. Please
forgive any typos...
cff0f400 bad magic 1a4b00bb (should be 8080840), wq bug forcing oops.
00000004 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
caf62000 bad magic 1a4b00bb (should be 8080840), ce37dac0 wq bug forcing oops.
08554000 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
08553000 bad magic 1a4b00bb (should be 8080840), c0126967 wq bug forcing oops.
08552000 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
00001000 bad magic 1a4b00bb (should be 8080840), wq bug forcing oops.
caf62000 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
caf62000 bad magic 1a4b00bb (should be 8080840), 0000004a wq bug forcing oops.
00000001 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
caf62000 bad magic 1a4b00bb (should be 8080840), c02ca420 wq bug forcing oops.
bfff950c kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
c0106c41 bad magic 1a4b00bb (should be 8080840),
Call Trace: wq bug, forcing oops.
[<c0126962>] kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
[<c010bc41>] bad magic 1a4b00bb (should be 8080840),
Code: wq bug, forcing oops.
83 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
79 bad magic 1a4b00bb (should be 8080840), f8 wq bug forcing oops.
00 kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
75 bad magic 1a4b00bb (should be 8080840), 68 wq bug forcing oops.
8b kernel BUG at /usr/src/linux/include/asm/semaphore.h:109!
7d
I'm assuming that this has something to do with semaphores and yes, I
did create /var/shm and add it to fstab.
If it was only the signal 11, I'd assume I have bad ram, cpus, or a
problem motherboard but this appears to be a kernel issue. Of course,
I'm still suspicious of the hardware too.
I've seen a lot of problems and some success reported with this board
(here and USENET) so I'm wondering if I'm missing some magic in the
BIOS or if my RAM might be bad or if this is a well known problem. I
would appreciate any help here since I have to decide whether or not
to send this back to Micro Pro. For what it's worth, they sent me the
pIIIdme with the CPUs and RAM installed.
Thanks,
Jon
--
Jonathan Hartzog
[EMAIL PROTECTED] || [EMAIL PROTECTED]
http://www.w00f.com/~jhartzog/
perl -e 'print reverse qw/H P A J/'
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/dmentre/smp-howto/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]