[mdb-discuss] Debugging Solaris 8 kernel hang

Gavin Maltby Mon, 18 Sep 2006 22:10:06 +0100

Hi

On 09/18/06 20:01, Chad Mynhier wrote:
> I'm trying to debug a Solaris 8 kernel hang with mdb.  (This appears
> to be a kernel hang and not simply a process hang, as we can get no
> response from the server and need to drop it to the ok prompt and
> reboot.)
> 
> I've done "::walk thread | ::findstack".  Of 95 threads, I get a stack
> trace for 79, and of those, 63 are sitting in cv_wait().  Here are
> some of the stack traces:
> 
> 28
>        cv_wait+0x38()
>        taskq_thread+0x110()
>        thread_start+4()


Ignore - task queue threads waiting for work to be dispatched to them.

> 6
>        cv_wait+0x38()
>        md_daemon+0x144()
>        thread_start+4()

Ignore - metadisk/svm/lvm monitoring threads

> 4
>        cv_wait+0x38()
>        taskq_thread+0xd4()
>        thread_start+4()

Same as above (just a different cv_wait in taskq_thread)

> 4
>        cv_wait+0x38()
>        sqthread+0x14c()
>        thread_start+4()

sqthread is not one I recognise.  It does not show up in ON source,
either.  I'd guess that having 4 threads cv_wait'ing in here its
likely not a problem.

> 
> 4
>        cv_wait+0x38()
>        ufs_thread_run+0x130()
>        ufs_thread_delete+0x4c()
>        thread_start+4()

Ignore.

> My first thought is that the above might be perfectly normal behavior.
> Is there anything I should be suspicious about?  If so, how can I
> probe deeper (given that this is Solaris 8)?  If not, does anyone have
> any pointers on what I could be looking at?

I'd start by looking at what threads are on cpu (::cpuinfo) and which have
been on cpu recently (t_disp_time relative to panic_lbolt and panic_lbolt64).
Also look for any kernel stacks waiting for locks - ie ending around
mutex_vector_enter; similarly for threads ending in sema_p/sema_wait and in
rw_rdlock, rw_wrlock etc.

Cheers

Gavin

-- 
Gavin Maltby, Solaris Kernel Development.

[mdb-discuss] Debugging Solaris 8 kernel hang

Reply via email to