Hallo,
I have some problems with the NCR driver crashing at boot.
I have two a bit broken SCSI devices on a chain: a dying SONY DAT
and a TEAC cdrom burner. Both sometimes give SCSI timeouts. It is possible
that it is a cable problem. The crash occurs reliably now, unless I pull
the SCSI cable it never boots up.
Anyways, even with cable problem the driver should not crash:
[This is 2.3.99-pre7-8, but even an old 2.2.5 and a 2.2.14 kernel crash too]
ncr53c8xx: at PCI bus 0, device 11, function 0
ncr53c8xx: 53c810 detected
ncr53c810-0: rev 0x2 on pci bus 0 device 11 function 0 irq 10
ncr53c810-0: ID 7, Fast-10, Parity Checking
ncr53c810-0: restart (scsi reset).
scsi0 : ncr53c8xx - version 3.2i
scsi : 1 host.
ncr53c810-0:2: ERROR (0:4) (8-0-0) (0/3) @ (script 38:86830000).
ncr53c810-0: script cmd = c0000004
ncr53c810-0: regdump: da 00 80 03 47 00 02 1f 71 08 00 00 80 00 08 02.
ncr53c810-0: have to clear fifos.
ncr53c810-0: unexpected disconnect
Unable to handle kernel NULL pointer dereference at virtual address 0000001d
c01e1435
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01e1435>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: c7f567d8 ebx: 00000000 ecx: 07f582cc edx: 00000077
esi: 00000000 edi: c7f5a000 ebp: c02aff14 esp: c02aff00
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c02af000)
Stack: c7f5a000 c7f5a000 0000000a 00000017 c7f5ac34 c02aff3c c01e267e c7f5a000
c7f5a000 00000246 0000000a c0301580 c0301468 00000001 ffffff00 c02aff54
c01e4a64 c7f5a000 c7ff2be0 04000001 0000000a c02aff74 c010bf9f 0000000a
Call Trace: [<c01e267e>] [<c01e4a64>] [<c010bf9f>] [<c010c122>] [<c0108ac8>]
[<c010aec0>] [<c0108ac8>]
[<c0108aee>] [<c0108b29>] [<c0105000>] [<c010018e>]
Code: 8a 43 1d 84 c0 7d 09 53 57 e8 39 f9 ff ff eb 0b a8 20 74 0a
>>EIP; c01e1435 <ncr_wakeup_done+81/c0> <=====
Trace; c01e267e <ncr_exception+92/478>
Trace; c01e4a64 <ncr53c8xx_intr+28/8c>
Trace; c010bf9f <handle_IRQ_event+33/68>
Trace; c010c122 <do_IRQ+72/c0>
Trace; c0108ac8 <default_idle+0/2c>
Trace; c010aec0 <ret_from_intr+0/20>
Trace; c0108ac8 <default_idle+0/2c>
Trace; c0108aee <default_idle+26/2c>
Trace; c0108b29 <cpu_idle+35/48>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c010018e <L6+0/2>
Code; c01e1435 <ncr_wakeup_done+81/c0>
00000000 <_EIP>:
Code; c01e1435 <ncr_wakeup_done+81/c0> <=====
0: 8a 43 1d movb 0x1d(%ebx),%al <=====
Code; c01e1438 <ncr_wakeup_done+84/c0>
3: 84 c0 testb %al,%al
Code; c01e143a <ncr_wakeup_done+86/c0>
5: 7d 09 jnl 10 <_EIP+0x10> c01e1445
<ncr_wakeup_done+91/c0>
Code; c01e143c <ncr_wakeup_done+88/c0>
7: 53 pushl %ebx
Code; c01e143d <ncr_wakeup_done+89/c0>
8: 57 pushl %edi
Code; c01e143e <ncr_wakeup_done+8a/c0>
9: e8 39 f9 ff ff call fffff947 <_EIP+0xfffff947> c01e0d7c
<ncr_complete+0/590>
Code; c01e1443 <ncr_wakeup_done+8f/c0>
e: eb 0b jmp 1b <_EIP+0x1b> c01e1450
<ncr_wakeup_done+9c/c0>
Code; c01e1445 <ncr_wakeup_done+91/c0>
10: a8 20 testb $0x20,%al
Code; c01e1447 <ncr_wakeup_done+93/c0>
12: 74 0a je 1e <_EIP+0x1e> c01e1453
<ncr_wakeup_done+9f/c0>
I had a kgdb setup handy so I checked it with that:
5205 if (cp->host_status & HS_DONEMASK)
(gdb) p cp
$1 = 0x0
(gdb) l
5200 cpu_to_scr(NCB_SCRIPT_PHYS (np, done_plug));
5201 MEMORY_BARRIER();
5202 np->scripth->done_queue[5*i + 4] =
5203 cpu_to_scr(NCB_SCRIPT_PHYS (np, done_end));
5204
5205 if (cp->host_status & HS_DONEMASK)
5206 ncr_complete (np, cp);
5207 else if (cp->host_status & HS_SKIPMASK)
5208 ncr_ccb_skipped (np, cp);
5209
cp is clearly NULL when it shouldn't. I tried to handle this case
like an empty cp with break, but that lead to an endless loop in the driver
I'm not exactly sure why the cp is NULL, but it looks like a bug.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]