Oops with 2.4.0: race in aic7xxx, faulty cpu or voodoo magic

Ion Badulescu Wed, 10 Jan 2001 04:18:11 -0800
This is the 4th time I'm getting this oops with 2.4.0, and it always
happens when something (usually updatedb) is stressing my SCSI disk.

The ksymoops trace is correct, I also ran it against saved versions of
/proc/ksyms and /proc/modules:

ksymoops 0.7c on i586 2.4.0.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.0/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000001
c483a79e
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c483a79e>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: c32e9ac0   ebx: c337e020   ecx: c11fb878   edx: 00000000
esi: c11fb8d8   edi: c11fb878   ebp: 00000004   esp: c01fde1c
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c01fd000)
Stack: c483b212 c11fb878 c32e9ac0 c337e000 c32e9ac0 c11fb878 00000009 00000000
       c01a959d c2305360 c11fb8c8 00000020 c0634500 c11fb8e8 c312feb4 c11fb878
       c48459a1 c485b087 c11fb8c8 c337e000 c4845dc1 c11fb878 c32e9ac0 c11fb878
Call Trace: [<c483b212>] [<c01a959d>] [<c48459a1>] [<c485b087>] [<c4845dc1>] 
[<c485b091>] [<c4845ebb>]
       [<c484612d>] [<c0109ffd>] [<c010a167>] [<c0108e90>] [<c48c6d92>] [<c0109ffd>] 
[<c010a167>] [<c0107180>]
       [<c0108e90>] [<c0107180>] [<c01071a3>] [<c01071fa>] [<c0105000>] [<c0100191>]
Code: c6 42 01 ff 8b 49 18 8b 11 89 50 08 89 01 8b 41 04 85 c0 75

>>EIP; c483a79e <[aic7xxx]aic7xxx_free_scb+32/50>   <=====
Trace; c483b212 <[aic7xxx]aic7xxx_done+a56/a70>
Trace; c01a959d <udp_queue_rcv_skb+ed/13c>
Trace; c48459a1 <[aic7xxx]aic7xxx_handle_command_completion_intr+49/48c>
Trace; c485b087 <[aic7xxx].bss.end+1848/3821>
Trace; c4845dc1 <[aic7xxx]aic7xxx_handle_command_completion_intr+469/48c>
Trace; c485b091 <[aic7xxx].bss.end+1852/3821>
Trace; c4845ebb <[aic7xxx]aic7xxx_isr+d7/318>
Trace; c484612d <[aic7xxx]do_aic7xxx_isr+31/8c>
Trace; c0109ffd <handle_IRQ_event+31/5c>
Trace; c010a167 <do_IRQ+6b/ac>
Trace; c0108e90 <ret_from_intr+0/20>
Trace; c48c6d92 <[wvlan_cs]wvlan_interrupt+f2/18c>
Trace; c0109ffd <handle_IRQ_event+31/5c>
Trace; c010a167 <do_IRQ+6b/ac>
Trace; c0107180 <default_idle+0/28>
Trace; c0108e90 <ret_from_intr+0/20>
Trace; c0107180 <default_idle+0/28>
Trace; c01071a3 <default_idle+23/28>
Trace; c01071fa <cpu_idle+32/48>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c0100191 <L6+0/2>
Code;  c483a79e <[aic7xxx]aic7xxx_free_scb+32/50>
00000000 <_EIP>:
Code;  c483a79e <[aic7xxx]aic7xxx_free_scb+32/50>   <=====
   0:   c6 42 01 ff               movb   $0xff,0x1(%edx)   <=====
Code;  c483a7a2 <[aic7xxx]aic7xxx_free_scb+36/50>
   4:   8b 49 18                  mov    0x18(%ecx),%ecx
Code;  c483a7a5 <[aic7xxx]aic7xxx_free_scb+39/50>
   7:   8b 11                     mov    (%ecx),%edx
Code;  c483a7a7 <[aic7xxx]aic7xxx_free_scb+3b/50>
   9:   89 50 08                  mov    %edx,0x8(%eax)
Code;  c483a7aa <[aic7xxx]aic7xxx_free_scb+3e/50>
   c:   89 01                     mov    %eax,(%ecx)
Code;  c483a7ac <[aic7xxx]aic7xxx_free_scb+40/50>
   e:   8b 41 04                  mov    0x4(%ecx),%eax
Code;  c483a7af <[aic7xxx]aic7xxx_free_scb+43/50>
  11:   85 c0                     test   %eax,%eax
Code;  c483a7b1 <[aic7xxx]aic7xxx_free_scb+45/50>
  13:   75 00                     jne    15 <_EIP+0x15> c483a7b3 
<[aic7xxx]aic7xxx_free_scb+47/50>


1 warning issued.  Results may not be reliable.

Clear NULL pointer oops, it seems, right? Well, not quite.
aic7xxx_free_scb looks like this (compiled with egcs-1.1.2):

0000270c <aic7xxx_free_scb>:
    270c:       8b 44 24 08             mov    0x8(%esp,1),%eax
    2710:       8b 4c 24 04             mov    0x4(%esp,1),%ecx
    2714:       c7 40 0c 00 00 00 00    movl   $0x0,0xc(%eax)
    271b:       c7 40 04 00 00 00 00    movl   $0x0,0x4(%eax)
    2722:       c6 40 15 00             movb   $0x0,0x15(%eax)
    2726:       c7 40 20 00 00 00 00    movl   $0x0,0x20(%eax)
    272d:       8b 10                   mov    (%eax),%edx
    272f:       c6 40 14 00             movb   $0x0,0x14(%eax)
    2733:       c6 02 00                movb   $0x0,(%edx)
    2736:       8b 10                   mov    (%eax),%edx
    2738:       c6 42 02 00             movb   $0x0,0x2(%edx)
    273c:       8b 10                   mov    (%eax),%edx
    273e:       c6 42 01 ff             movb   $0xff,0x1(%edx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    2742:       8b 49 18                mov    0x18(%ecx),%ecx
    2745:       8b 11                   mov    (%ecx),%edx
    2747:       89 50 08                mov    %edx,0x8(%eax)
    274a:       89 01                   mov    %eax,(%ecx)
    274c:       8b 41 04                mov    0x4(%ecx),%eax
    274f:       85 c0                   test   %eax,%eax
    2751:       75 05                   jne    2758 <aic7xxx_free_scb+0x4c>
    2753:       8b 01                   mov    (%ecx),%eax
    2755:       89 41 04                mov    %eax,0x4(%ecx)
    2758:       c3                      ret

The underlined row has the faulting instruction. But, %eax doesn't
change, (%eax) doesn't change either... what could cause %edx to become 0
*so late*? We are in an ISR, so an interrupt couldn't do it, and the
system is UP so SMP locking isn't an issue either.

And what's causing all those %edx reloads, I can't find any volatiles in
the structure pointed to by %eax (struct aic7xxx_scb)?

I'm truly confused..

Thanks,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
Oops with 2.4.0: race in aic7xxx, faulty cpu or voodoo magic

Reply via email to