I have an oops to report, but there are some mitigating circumstances. I
am also using the DAC960 patch for that hardware raid card. So my kernel
is different from a straight 2.2.4 + raid0145-19990309 system.
To patch I let patch do it's work and cleaned up the rejections, which
were minor in every case, and 'obvious' to fix for the most part, although
I could have made a mistake, of course. I applied the raid0145 to vanilla
2.2.4 (and committed and tagged in CVS) then applied the DAC960 patch.
I can produce my diffs against vanilla 2.2.4 (that's just the SW RAID
part) and then incrementally the DAC960 diffs. The entire thing's in
CVS...
The DAC960 makes some changes to genhd and ll_rw_blk, so maybe there's a
conflict. On the other hand, maybe this is a straight out problem.
Here's the processed oops:
Unable to handle kernel NULL pointer dereference at virtual address
00000008
current->tss.cr3 = 06af5000, %cr3 = 06af5000
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c01766ee>]
EFLAGS: 00010212
eax: d08bb170 ebx: 00000000 ecx: 00000020 edx: 0000000a
esi: 00000010 edi: 77767574 ebp: 00000004 esp: cd9b7dd0
ds: 0018 es: 0018 ss: 0018
Process bonnie_linux_gl (pid: 7899, process nr: 97, stackpage=cd9b7000)
Stack: c19ef150 c19ef14e 00000246 d08b9000 d08bb170 d08b7000 00000060
00000002
c016eff9 c6957d60 00000900 c19ef14e c19ef150 00000002 00000001
cd9b7e7c
00000009 00000004 c016b095 00000900 c19ef14e c19ef150 00000002
00000000
Call Trace: [<d08b9000>] [<d08bb170>] [<d08b7000>] [<c016eff9>]
[<c016b095>] [<c0129818>] [<c0129a89>]
[<c011e4cf>] [<c011e8f2>] [<c011ecc4>] [<c011ec10>] [<c0126fda>]
[<c0108ccc>] [<c010002b>]
Code: 8b 53 08 03 13 39 d7 7c 25 8b 58 04 85 db 75 1e 57 68 53 f9
>>EIP: c01766ee <raid0_map+8a/118>
Trace: d08b9000 <driver_template+e270/392bc>
Trace: d08bb170 <driver_template+103e0/392bc>
Trace: d08b7000 <driver_template+c270/392bc>
Trace: c016eff9 <md_map+41/4c>
Trace: c016b095 <ll_rw_block+e9/21c>
Trace: c0129818 <brw_page+2c8/3bc>
Trace: c0129a89 <generic_readpage+e9/f8>
Trace: c011e4cf <try_to_read_ahead+10f/128>
Code: c01766ee <raid0_map+8a/118> 00000000 <_EIP>: <===
Code: c01766ee <raid0_map+8a/118> 0: 8b 53 08
movl 0x8(%ebx),%edx <===
Code: c01766f1 <raid0_map+8d/118> 3: 03 13
addl (%ebx),%edx
Code: c01766f3 <raid0_map+8f/118> 5: 39 d7
cmpl %edx,%edi
Code: c01766f5 <raid0_map+91/118> 7: 7c 25 jl
c017671c <raid0_map+b8/118>
Code: c01766f7 <raid0_map+93/118> 9: 8b 58 04
movl 0x4(%eax),%ebx
Code: c01766fa <raid0_map+96/118> c: 85 db
testl %ebx,%ebx
Code: c01766fc <raid0_map+98/118> e: 75 1e
jne c017671c <raid0_map+b8/118>
Code: c01766fe <raid0_map+9a/118> 10: 57
pushl %edi
Code: c01766ff <raid0_map+9b/118> 11: 68 53 f9 00 00
pushl $0xf953
A short description of the system and what it was doing: System is Dual
PII 450Mhz, with DAC960 running Raid5 root filesystem, and an aic7xxx with
6 ultra2 cheetahs running raid0 with the sw raid patch above. At the time
of the crash, one user was uncompressing a large file on the raid5
(DAC960) volume, which was saturating that IO channel.
I was running bonnie on the sw raid to saturate that IO channel, and
utilize the other CPU. I was trying to see whether the bonnie throughput
would be significantly affected by IO on another device in a dual CPU
system. Well... I oopsed instead.
If there is anything I can do, let me know.
--
/==============================\
| David Mansfield |
| [EMAIL PROTECTED] |
\==============================/