Re: Oopses with 2.4.0-test5 in sym53c8xx

D. Lance Robinson Mon, 07 Aug 2000 15:37:55 -0700
Chris,

I think Gerard is on vacation. But, before he left, he made available some
code that you should probably try with your problem...

ftp://ftp.tux.org/pub/roudier/drivers/linux/stable/
        sym-1.7.1-ncr-3.4.1.tar.gz

<>< Lance.

Chris Meadors wrote:

> I've been seeing an oops on this machine ever since I first started it
> up.  I've been going over this with the people on linux-kernel.  Because
> of the nature of the oops they weren't being logged to disk, so I didn't
> have a good decode.  Today I set up a serial console and finally got to
> see where the bugger was oopsing.
>
> I don't know how many people here are also on l-k, but I'll give a brief
> run down of my system configuration and history of the problem for the
> benifit of those who aren't.
>
> I have a motherboard with dual SYM53C896 controllers.  I'm only using
> one.  It is connected to an external RAID module.  This RAID controller
> has 3 SYM53C895s on it.  Obviously one channel is connected to the
> controller on the motherboard, and another is to the drive array.  But the
> third channel is connected to a second motherboard.  Eventually (when I
> can get a stock kernel stable) I will be running a clustered file system
> to be shared between the two motherboards.  Right now the second board is
> only mounting the file system read only so it isn't messing with anything.
>
> This is a SMP machine with a ServerWorks chipset and 1 GB of RAM.
>
> Please do no hesitate to ask me for any additional information.  This
> machine has no valuable data on it so I'm not afraid to try anything.
>
> First I booted the machine from a boot/root floppy set that I made using
> the 2.4.0-test5 kernel I had configured for this machine.  I NFS mounted
> my work machine to copy the directory structure to the new machine's local
> drives.  Part way through the copy the machine oopsed.  I re-mkfs'ed the
> drives and tried again, same thing.  So next time I did it in smaller
> chunks and eventually got everything copied without problems.  I setup the
> fstab and lilo.conf and rebooted the machine.  It came up fine.
>
> The next step in my usual installation is to recompile everything I'm
> going to be using on the machine.  So I started with the kernel, to make
> sure I had all the options set exactly to my liking.  That went fine.  So
> on to glibc (2.1.3).  Part way though the build sed complained about not
> being albe to find a file.  I looked and the file was there.  So I typed
> make agian, it got past that point only to stop with an other missing file
> error.  Make agian, something in the build process segfaulted this
> time.  Make one more time, and I got an oops.
>
> I tried glibc a few more times, totally deleting the directory and
> starting over, never with any luck.  So I figured I'd try rebuilding my
> build programs.  gcc, binutils, and make.  I got all of them to build just
> fine.  Even gcc's "make bootstrap" went without a hitch.  I tried glibc
> again, still no good.
>
> Somewhere in here I thought I might be seeing a hardware problem.  So I
> tested everything as well as I could, running memtest86, burnP6, and
> burnBX for hours on end.  Not one lock-up or error anywhere.  Next I moved
> to bonnie++.  One bonnine++ running by itself is no problem.  But 4 of
> them started at the same time would oops the machine when it got to the
> "intelligently writing" part.
>
> I started all this on Monday.  It was now Friday.  I had been running the
> RAID controller on two 16 MB non-parity SIMMs I found laying around.  I
> really wanted parity RAM, so I had ordered two 64 MB 60ns EDO parity
> SIMMs.  They arrived on Friday and I installed them.  The machine seemed a
> little more stable.  So I ran bonnie++ again.  I was so happy to see it
> pass the "intelligently writing", so went dancing around the office.  When
> I came back I found that it had oopsed during the part where it creates
> the thousands of files.  So I restarted the machine and watched the the
> whole bonnie++ process.  It did the creating files thing for a really long
> time (I'm still using the 4 bonnie++s at once).  So it is probally just
> about done when the oops comes.
>
> Just to make myself feel a little better I went back to glibc.  This time
> with the 128MB of cache on the RAID controller it compiled all the way
> through without problem.  But 4 bonnie++s can always kill it without fail.
>
> This oops is from 2.4.0-test5 (I have also tried -test6-pre6 with and got
> just about the same oops) dying during create sequential files part of 4
> bonnie++s running at the same time:
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
> c0119024
> *pde = 00000000
> Oops: 0000
> CPU:    1
> EIP:    0010:[<c0119024>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010087
> eax: c026dcd8   ebx: f7e8d6c0   ecx: 00000023   edx: 00000000
> esi: f7e8d720   edi: 00000000   ebp: c1e19e14   esp: c1e19d84
> ds: 0018   es: 0018   ss: 0018
> Process swapper (pid: 0, stackpage=c1e19000)
> Stack: f7e8d6c0 f7e8d720 f7d65d0c f7eb3e10 c01a19fe f7eb6000 dd3b1800
> dd3b1800
>        f7d64e62 dd3b1c72 f7d64e00 00000022 f7eb6000 f7eb3e00 00000002
> 00000046
>        f7d64e00 f7eb6000 00000202 f7d65c00 f7edd980 00000286 00000082
> f7edd980
> Call Trace: [<c01a19fe>] [<c0168e22>] [<c01addf8>] [<c01ad5bb>]
> [<c01a9092>] [<c01ad790>] [<c01ad9dd>]
>        [<c019f707>] [<c01afc3f>] [<c01b0196>] [<c01a76b8>] [<c010c8b1>]
> [<c010ca96>] [<c0109390>] [<c0109390>]
>        [<c010b1e0>] [<c0109390>] [<c0109390>] [<c0100018>] [<c01093bd>]
> [<c0109422>] [<c010cad4>] [<c0206ce6>]
> Code: 8b 17 89 d0 24 df 85 45 f8 0f 84 54 03 00 00 8b 4d f8 89 4d
>
> >>EIP; c0119024 <__wake_up+84/738>   <=====
> Trace; c01a19fe <ncr_start_next_ccb+56/88>
> Trace; c0168e22 <blkdev_release_request+3a/3c>
> Trace; c01addf8 <scsi_request_fn+20c/314>
> Trace; c01ad5bb <scsi_queue_next_request+47/110>
> Trace; c01a9092 <scsi_release_command+116/120>
> Trace; c01ad790 <__scsi_end_request+10c/118>
> Trace; c01ad9dd <scsi_io_completion+189/358>
> Trace; c019f707 <rw_intr+1cb/1d8>
> Trace; c01afc3f <scsi_old_done+43/5b4>
> Trace; c01b0196 <scsi_old_done+59a/5b4>
> Trace; c01a76b8 <sym53c8xx_intr+80/94>
> Trace; c010c8b1 <handle_IRQ_event+4d/78>
> Trace; c010ca96 <do_IRQ+a6/f4>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0109390 <default_idle+0/34>
> Trace; c010b1e0 <ret_from_intr+0/20>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0100018 <startup_32+18/cc>
> Trace; c01093bd <default_idle+2d/34>
> Trace; c0109422 <cpu_idle+3e/54>
> Trace; c010cad4 <do_IRQ+e4/f4>
> Trace; c0206ce6 <vsprintf+33e/36c>
> Code;  c0119024 <__wake_up+84/738>
> 00000000 <_EIP>:
> Code;  c0119024 <__wake_up+84/738>   <=====
>    0:   8b 17                     mov    (%edi),%edx   <=====
> Code;  c0119026 <__wake_up+86/738>
>    2:   89 d0                     mov    %edx,%eax
> Code;  c0119028 <__wake_up+88/738>
>    4:   24 df                     and    $0xdf,%al
> Code;  c011902a <__wake_up+8a/738>
>    6:   85 45 f8                  test   %eax,0xfffffff8(%ebp)
> Code;  c011902d <__wake_up+8d/738>
>    9:   0f 84 54 03 00 00         je     363 <_EIP+0x363> c0119387
> <__wake_up+3e7/738>
> Code;  c0119033 <__wake_up+93/738>
>    f:   8b 4d f8                  mov    0xfffffff8(%ebp),%ecx
> Code;  c0119036 <__wake_up+96/738>
>   12:   89 4d 00                  mov    %ecx,0x0(%ebp)
>
> Aiee, killing interrupt handler
> Kernel panic: Attempted to kill the idle task!
> NMI Watchdog detected LOCKUP on CPU0, registers:
> CPU:    0
> EIP:    0010:[<c020c7a4>]
> EFLAGS: 00000086
> eax: f7d65a00   ebx: f7eb6000   ecx: f7eb6054   edx: f7eb6000
> esi: 00000286   edi: 00000000   ebp: c02b4c40   esp: c0277f1c
> ds: 0018   es: 0018   ss: 0018
> Process swapper (pid: 0, stackpage=c0277000)
> Stack: f7eb6000 c01a76cc c0121b15 f7eb6000 00000000 00000000 00000000
> c02b4c40
>        c017c546 f7edd680 00000086 c011e351 c02cc120 00000000 c011e257
> 00000000
>        00000001 c02bc1a0 00000000 0000000e c011e0fc c02bc1a0 c02cc484
> c02b2800
> Call Trace: [<c01a76cc>] [<c0121b15>] [<c017c546>] [<c011e351>]
> [<c011e257>] [<c011e0fc>] [<c010cad4>]
>        [<c0109390>] [<c0109390>] [<c010b1e0>] [<c0109390>] [<c0109390>]
> [<c0100018>] [<c01093bd>] [<c0109422>]
>        [<c0105000>] [<c01001d0>]
> Code: 80 3d 64 42 26 c0 00 f3 90 7e f5 e9 4b af f9 ff 80 7b 44 00
>
> >>EIP; c020c7a4 <stext_lock+4084/9b20>   <=====
> Trace; c01a76cc <sym53c8xx_timeout+0/68>
> Trace; c0121b15 <timer_bh+259/2b0>
> Trace; c017c546 <rs_interrupt_single+72/88>
> Trace; c011e351 <bh_action+4d/b4>
> Trace; c011e257 <tasklet_hi_action+4f/7c>
> Trace; c011e0fc <do_softirq+5c/8c>
> Trace; c010cad4 <do_IRQ+e4/f4>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0109390 <default_idle+0/34>
> Trace; c010b1e0 <ret_from_intr+0/20>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0109390 <default_idle+0/34>
> Trace; c0100018 <startup_32+18/cc>
> Trace; c01093bd <default_idle+2d/34>
> Trace; c0109422 <cpu_idle+3e/54>
> Trace; c0105000 <empty_bad_page+0/1000>
> Trace; c01001d0 <L6+0/2>
> Code;  c020c7a4 <stext_lock+4084/9b20>
> 00000000 <_EIP>:
> Code;  c020c7a4 <stext_lock+4084/9b20>   <=====
>    0:   80 3d 64 42 26 c0 00      cmpb   $0x0,0xc0264264   <=====
> Code;  c020c7ab <stext_lock+408b/9b20>
>    7:   f3 90                     repz nop
> Code;  c020c7ad <stext_lock+408d/9b20>
>    9:   7e f5                     jle    0 <_EIP>
> Code;  c020c7af <stext_lock+408f/9b20>
>    b:   e9 4b af f9 ff            jmp    fff9af5b <_EIP+0xfff9af5b>
> c01a76ff <sym53c8xx_timeout+33/68>
> Code;  c020c7b4 <stext_lock+4094/9b20>
>   10:   80 7b 44 00               cmpb   $0x0,0x44(%ebx)
>
> --
> Two penguins were walking on an iceburg.  The first one said to the
> second, "you look like you are wearing a tuxedo."  The second one said,
> "I might be..."
>                                               --David Lynch, Twin Peaks
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
Re: Oopses with 2.4.0-test5 in sym53c8xx

Reply via email to