Re: amr still seems to have issues.

Mike Smith Fri, 21 Apr 2000 09:36:42 -0700
> * Mike Smith <[EMAIL PROTECTED]> [000420 11:39] wrote:
> > > Hi, we're running 4.0-stable as of Sat Apr 15 18:39:08 PDT 2000
> > > which include the recent amr fixes which we were hoping would cure
> > > the lockups with amr.  Unfortunatly we are now experiancing reboots,
> > > the messages file reveals this:
> > >
> > > Apr 15 13:31:06 abacus /kernel: amr0: command 31 wedged after 30 seconds
> > 
> > This is extra-bad.  Without more feedback from the controller (no 
> > documentation from AMI yet, sorry. 8() I can only wonder whether you're 
> > getting a SCSI bus error of some sort that's causing the kernel to time 
> > these commands out (because the controller is taking too long to respond).
> > 
> > You could try increasing the timeout allowance in amr_periodic(), or just 
> > disable the poll entirely.  This won't help if the controller is really 
> > dropping commands, though.
> > 
> > > Right now I'm attempting to log off a serial console to see what's
> > > going on, however this box has been in production (and doing miserably)
> > > for some time now so doing debugging is pretty difficult as well as
> > > time consuming where I really need to be working on other issues.
> > 
> > At this point, I have no other ideas, sorry.
> 
> Here's something I hope it helps:

Hmm.  Looks like I'm not retiring the wedged command correctly.  This is 
a symptom, rather than the real problem, though.  See if disabling the 
command timeout stuff makes the system happy - either these commands are 
just taking a _long_ time to complete, or you have another problem.

I _still_ think you have disk, cable or enclosure issues, but now that 
you've precipitated this case I can go look at what I'm doing wrong here. 
Thanks!

> amr0: command 40 wedged after 30 seconds
> biodone: page busy < 0, pindex: 144, foff: 0x(0,90000), resid: 4096, index: 0
>  iosize: 8192, lblkno: 72, flags: 0x30020aa0, npages: 2
>  valid: 0xff, dirty: 0x0, wired: 1
> panic: biodone: page busy < 0
> 
> mp_lock = 01000001; cpuid = 1; lapic.id = 00000000
> boot() called on cpu#1
> 
> syncing disks... 
> 
> Fatal trap 12: page fault while in kernel mode
> mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> fault virtual address   = 0x30
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc0226765
> stack pointer           = 0x10:0xff80dd9c
> frame pointer           = 0x10:0xff80dda0                                       code 
>segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = bio  <- SMP: XXX
> trap number             = 12
> panic: page fault
> mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> boot() called on cpu#1
> Uptime: 2d4h11m39s
> amrd0: still open, can't shutdown
> 
> dumping to dev #da/0x20001, offset 128
> dump 1023 1022 Aborting dump due to I/O error.
> (da0:ahc1:0:6:0): WRITE(06). CDB: a 7 da f7 8 0 
> (da0:ahc1:0:6:0): error code 0 at block no. -964632618 (decimal)
> failed, reason: i/o error
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> Any ideas?
> 
> -- 
> -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
> "I have the heart of a child; I keep it in a jar on my desk."
> 

-- 
\\ Give a man a fish, and you feed him for a day. \\  Mike Smith
\\ Tell him he should learn how to fish himself,  \\  [EMAIL PROTECTED]
\\ and he'll hate you for a lifetime.             \\  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message
Re: amr still seems to have issues.

Reply via email to