Quoth Attilio Rao on Thursday, 18 August 2011:
> 2011/8/18 Hiroki Sato <[email protected]>:
> > Hiroki Sato <[email protected]> wrote
> >  in <[email protected]>:
> >
> > hr> Attilio Rao <[email protected]> wrote
> > hr>   in 
> > <caj-fndcdow0_b2mv0lzeo-tpea9+7oanj7ihvkqsm4j4b0d...@mail.gmail.com>:
> > hr>
> > hr> at> 2011/8/17 Hiroki Sato <[email protected]>:
> > hr> at> > Hi,
> > hr> at> >
> > hr> at> > Mike Tancsa <[email protected]> wrote
> > hr> at> >  in <[email protected]>:
> > hr> at> >
> > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> > hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> > hr> at> > mi> >>
> > hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the 
> > spinlock
> > hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded 
> > to the
> > hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the 
> > stack trace
> > hr> at> > mi> >> for the owner thread was not available.
> > hr> at> > mi> >>
> > hr> at> > mi> >> I was unable to make any conclusion from the data that was 
> > present.
> > hr> at> > mi> >> If the situation is reproducable, you coulld try to revert 
> > r221937. This
> > hr> at> > mi> >> is pure speculation, though.
> > hr> at> > mi> >
> > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and 
> > revert r221937
> > hr> at> > mi> > unless there is any extra debugging you want me to add to 
> > the kernel
> > hr> at> > mi> > instead  ?
> > hr> at> >
> > hr> at> >  I am also suffering from a reproducible panic on an 8-STABLE 
> > box, an
> > hr> at> >  NFS server with heavy I/O load.  I could not get a kernel dump
> > hr> at> >  because this panic locked up the machine just after it occurred, 
> > but
> > hr> at> >  according to the stack trace it was the same as posted one.
> > hr> at> >  Switching to an 8.2R kernel can prevent this panic.
> > hr> at> >
> > hr> at> >  Any progress on the investigation?
> > hr> at>
> > hr> at> Hiroki,
> > hr> at> how easilly can you reproduce it?
> > hr>
> > hr>  It takes 5-10 hours.  I installed another kernel for debugging just
> > hr>  now, so I think I will be able to collect more detail information in
> > hr>  a couple of days.
> > hr>
> > hr> at> It would be important to have a DDB textdump with these 
> > informations:
> > hr> at> - bt
> > hr> at> - ps
> > hr> at> - show allpcpu
> > hr> at> - alltrace
> > hr> at>
> > hr> at> Alternatively, a coredump which has the stop cpu patch which Andryi 
> > can provide.
> > hr>
> > hr>  Okay, I will post them once I can get another panic.  Thanks!
> >
> >  I got the panic with a crash dump this time.  The result of bt, ps,
> >  allpcpu, and traces can be found at the following URL:
> >
> >  http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt
> 
> Actually, I think I see the bug here.
> 
> In callout_cpu_switch() if a low priority thread is migrating the
> callout and gets preempted after the outcoming cpu queue lock is left
> (and scheduled much later) we get this problem.
> 
> In order to fix this bug it could be enough to use a critical section,
> but I think this should be really interrupt safe, thus I'd wrap them
> up with spinlock_enter()/spinlock_exit(). Fortunately
> callout_cpu_switch() should be called rarely and also we already do
> expensive locking operations in callout, thus we should not have
> problem performance-wise.
> 
> Can the guys I also CC'ed here try the following patch, with all the
> initial kernel options that were leading you to the deadlock? (thus
> revert any debugging patch/option you added for the moment):
> http://www.freebsd.org/~attilio/callout-fixup.diff
> 
> Please note that this patch is for STABLE_8, if you can confirm the
> good result I'll commit to -CURRENT and then backmarge as soon as
> possible.
> 
> Thanks,
> Attilio
> 

Thanks, Attilio.  I've applied the patch and removed the extra debug
options I had added (though keeping debug symbols).  I'll let you know if
I experience any more panics.

Regards,

-- 
.O. | Sterling (Chip) Camden      | http://camdensoftware.com
..O | [email protected] | http://chipsquips.com
OOO | 2048R/D6DBAF91              | http://chipstips.com

Attachment: pgpJ447gdPrNv.pgp
Description: PGP signature

Reply via email to