Terry Kennedy wrote:
I'm reposting this over here at the suggestion of the Forums moderator.
The original post is at http://forums.freebsd.org/showthread.php?t=14163
Got an interesting crash just now (well, as interesting as a crash on a
soon-to-be production system can be).
This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th.
The system didn't complete the crash dump, so it needed a manual reset to get
it going again.
The crash was a "page fault while in kernel mode" with the current process
being the interrupt service routine for the bce0 GigE. Things progressed
reasonably until partway through the dump, when the system locked up with a
"Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the
same PID as reported in the main crash.
Hmm. You could try changing the code to not do a nested panic in that
case. You would update subr_turnstile.c to just return if panicstr is
not NULL rather than calling panic. However, there is still a good
chance you will end up deadlocking in that case. I have another patch I
can send you next week that prevents blocking on mutexes duing a panic
which may also help.
3) Is there any way to rig the system to obtain more info if this happens
again? Right now I'm using an embedded remote console server, but I could
switch the system to a serial port if enabling the kernel debugger might help.
But I think that the sleeping thread bit would happen even at the debugger
prompt, wouldn't it?
Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.
I just booted the new kernel and tried this again, and got another crash. The
message is identical to the first, except that the instruction pointer changed
by 0x10 (presumably due to code differences between the old and new kernels)
and it got 6MB further writing the crash dump.
Since it seems I can reproduce this at will, I'll be glad to either perform
additional information-gathering or give a developer access to the box for
testing purposes.
Is it possible to correlate the source line in the kernel with the instruction
pointer in the panic?
If you are booted into the same kernel with the same modules loaded, you
can probably run 'kgdb' as root do 'l *<instruction pointer>'.
--
John Baldwin
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"