NCR crash on unexpected disconnect

Andi Kleen Wed, 10 May 2000 07:05:43 -0700

Hallo,

I have some problems with the NCR driver crashing at boot.
I have two a bit broken SCSI devices on a chain: a dying SONY DAT
and a TEAC cdrom burner. Both sometimes give SCSI timeouts. It is possible
that it is a cable problem. The crash occurs reliably now, unless I pull
the SCSI cable it never boots up. 

Anyways, even with cable problem the driver should not crash:

[This is 2.3.99-pre7-8, but even an old 2.2.5 and a 2.2.14 kernel crash too]

ncr53c8xx: at PCI bus 0, device 11, function 0
ncr53c8xx: 53c810 detected 
ncr53c810-0: rev 0x2 on pci bus 0 device 11 function 0 irq 10
ncr53c810-0: ID 7, Fast-10, Parity Checking
ncr53c810-0: restart (scsi reset).
scsi0 : ncr53c8xx - version 3.2i
scsi : 1 host.
ncr53c810-0:2: ERROR (0:4) (8-0-0) (0/3) @ (script 38:86830000).
ncr53c810-0: script cmd = c0000004
ncr53c810-0: regdump: da 00 80 03 47 00 02 1f 71 08 00 00 80 00 08 02.
ncr53c810-0: have to clear fifos.
ncr53c810-0: unexpected disconnect
Unable to handle kernel NULL pointer dereference at virtual address 0000001d
c01e1435
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c01e1435>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: c7f567d8   ebx: 00000000   ecx: 07f582cc   edx: 00000077
esi: 00000000   edi: c7f5a000   ebp: c02aff14   esp: c02aff00
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02af000)
Stack: c7f5a000 c7f5a000 0000000a 00000017 c7f5ac34 c02aff3c c01e267e c7f5a000 
       c7f5a000 00000246 0000000a c0301580 c0301468 00000001 ffffff00 c02aff54 
       c01e4a64 c7f5a000 c7ff2be0 04000001 0000000a c02aff74 c010bf9f 0000000a 
Call Trace: [<c01e267e>] [<c01e4a64>] [<c010bf9f>] [<c010c122>] [<c0108ac8>] 
[<c010aec0>] [<c0108ac8>] 
       [<c0108aee>] [<c0108b29>] [<c0105000>] [<c010018e>] 
Code: 8a 43 1d 84 c0 7d 09 53 57 e8 39 f9 ff ff eb 0b a8 20 74 0a 

>>EIP; c01e1435 <ncr_wakeup_done+81/c0>   <=====
Trace; c01e267e <ncr_exception+92/478>
Trace; c01e4a64 <ncr53c8xx_intr+28/8c>
Trace; c010bf9f <handle_IRQ_event+33/68>
Trace; c010c122 <do_IRQ+72/c0>
Trace; c0108ac8 <default_idle+0/2c>
Trace; c010aec0 <ret_from_intr+0/20>
Trace; c0108ac8 <default_idle+0/2c>
Trace; c0108aee <default_idle+26/2c>
Trace; c0108b29 <cpu_idle+35/48>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c010018e <L6+0/2>
Code;  c01e1435 <ncr_wakeup_done+81/c0>
00000000 <_EIP>:
Code;  c01e1435 <ncr_wakeup_done+81/c0>   <=====
   0:   8a 43 1d                  movb   0x1d(%ebx),%al   <=====
Code;  c01e1438 <ncr_wakeup_done+84/c0>
   3:   84 c0                     testb  %al,%al
Code;  c01e143a <ncr_wakeup_done+86/c0>
   5:   7d 09                     jnl    10 <_EIP+0x10> c01e1445 
<ncr_wakeup_done+91/c0>
Code;  c01e143c <ncr_wakeup_done+88/c0>
   7:   53                        pushl  %ebx
Code;  c01e143d <ncr_wakeup_done+89/c0>
   8:   57                        pushl  %edi
Code;  c01e143e <ncr_wakeup_done+8a/c0>
   9:   e8 39 f9 ff ff            call   fffff947 <_EIP+0xfffff947> c01e0d7c 
<ncr_complete+0/590>
Code;  c01e1443 <ncr_wakeup_done+8f/c0>
   e:   eb 0b                     jmp    1b <_EIP+0x1b> c01e1450 
<ncr_wakeup_done+9c/c0>
Code;  c01e1445 <ncr_wakeup_done+91/c0>
  10:   a8 20                     testb  $0x20,%al
Code;  c01e1447 <ncr_wakeup_done+93/c0>
  12:   74 0a                     je     1e <_EIP+0x1e> c01e1453 
<ncr_wakeup_done+9f/c0>

I had a kgdb setup handy so I checked it with that:

5205                    if (cp->host_status & HS_DONEMASK)
(gdb) p cp
$1 = 0x0
(gdb) l
5200                                    cpu_to_scr(NCB_SCRIPT_PHYS (np, done_plug));
5201                    MEMORY_BARRIER();
5202                    np->scripth->done_queue[5*i + 4] =
5203                                    cpu_to_scr(NCB_SCRIPT_PHYS (np, done_end));
5204    
5205                    if (cp->host_status & HS_DONEMASK)
5206                            ncr_complete (np, cp);
5207                    else if (cp->host_status & HS_SKIPMASK)
5208                            ncr_ccb_skipped (np, cp);
5209    

cp is clearly NULL when it shouldn't. I tried to handle this case
like an empty cp with break, but that lead to an endless loop in the driver

I'm not exactly sure why the cp is NULL, but it looks like a bug.


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
NCR crash on unexpected disconnect

Reply via email to