Hi Liujun,
Ok.  Here is the way I would "attack" this problem.  First, load kmdb:

# mdb -K

Since this is sparc (based on addresses), you should be able to do this 
from your console.  You may need to add
"-F" to the mdb command line.  This should work, as you have a second 
cpu that is not also hung.
This will drop you into kmdb.  There, put a breakpoint at 
ssfcp_handle_devices+44c (the location where ssfcp_outstanding_lun_cmds 
should return:
ssfcp_handle_device+44c:b

Then, continue:

:c

If you hit the breakpoint, try the next function in the stack.  If you 
don't hit the breakpoint, the code is looping in 
ssfcp_outstanding_lun_cmds or in something it calls.  In that case, put 
a breakpoint in ssfcp_outstanding_lun_cmds+18 and
again continue.

If you hit the breakpoint, start single stepping and, at the same time, 
look at the C code to see if there is a loop.  Basically, if you are not 
switching, it implies that the stack trace you have is not returning out 
to user.
To single step, you can use :s or :e, or, better, just type '[' or ']'.  
(Then you don't have to type a carriage return).  '[' skips over 
functions, ']' steps into function calls.  I suggest '[' to start with, 
or you will be there all day.  (You may be there all day anyway, but
you can save yourself a little time).

If you need more detail, let me know.  Also, this may be a known 
problem, so maybe the first thing is to look in the bug listings
to see if it's already known.

I hope this helps.

max

liujun wrote:
> max, 
>
> The stack for that thread is :
>   
>> 2a100371cc0::findstack -v
>>     
> stack pointer for thread 2a100371cc0: 2a100370d61 [ 000002a100370d61 
> ktl0+0x48() ]
>   000002a100370eb1 ssfcp_outstanding_lun_cmds+0x18(60001918fd8, 0, 
> 60003081a40, 60003590228, 1, 100000)
>   000002a100370f61 ssfcp_handle_devices+0x444(12b9ff0, 60001918fd8, 2, 
> 60005c977b8, 1, 60001918ff0)
>   000002a100371071 ssfcp_statec_callback+0x614(60005c977b8, 1606d, 2, 
> 60000205000, 600019433a8, 404)
>   000002a100371141 fctl_ulp_statec_cb+0x250(1, 2, 6000015d1d8, 6000577d188, 
> 60000106000, ff000000)
>   000002a100371201 taskq_thread+0x1a4(60000010f90, 60000010f38, 50000, 
> 5208a1333c14, 2a100371aca, 2a100371ac8)
>   000002a1003712d1 thread_start+4(60000010f38, 0, 0, 0, 0, 0)
>   
>
>
>   
>> ssfcp_outstanding_lun_cmds+0x18::dis 
>>     
> ssfcp_handle_ipkt_errors+0x2c4: mov       %i0, %o0
> ssfcp_handle_ipkt_errors+0x2c8: sra       %l0, 0, %i0
> ssfcp_handle_ipkt_errors+0x2cc: ret
> ssfcp_handle_ipkt_errors+0x2d0: restore
> ssfcp_outstanding_lun_cmds:     save      %sp, -0xb0, %sp
> ssfcp_outstanding_lun_cmds+4:   ldx       [%i0 + 0x20], %i2
> ssfcp_outstanding_lun_cmds+8:   cmp       %i2, 0
> ssfcp_outstanding_lun_cmds+0xc: 
> be,pn     %xcc, +0x7c   <ssfcp_outstanding_lun_cmds+0x88>
> ssfcp_outstanding_lun_cmds+0x10:clr       %i1
> ssfcp_outstanding_lun_cmds+0x14:mov       %i2, %o0
> ssfcp_outstanding_lun_cmds+0x18:call      -0x2891ac     <mutex_enter>
> ssfcp_outstanding_lun_cmds+0x1c:nop
> ssfcp_outstanding_lun_cmds+0x20:ldx       [%i2 + 0x10], %i3
> ssfcp_outstanding_lun_cmds+0x24:cmp       %i3, %i1
> ssfcp_outstanding_lun_cmds+0x28:
> be,pn     %xcc, +0x48   <ssfcp_outstanding_lun_cmds+0x70>
> ssfcp_outstanding_lun_cmds+0x2c:nop
> ssfcp_outstanding_lun_cmds+0x30:ld        [%i3 + 0x54], %i4
> ssfcp_outstanding_lun_cmds+0x34:cmp       %i4, 2
> ssfcp_outstanding_lun_cmds+0x38:
> be,pn     %icc, +0x20   <ssfcp_outstanding_lun_cmds+0x58>
> ssfcp_outstanding_lun_cmds+0x3c:nop
> ssfcp_outstanding_lun_cmds+0x40:ldx       [%
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> mdb-discuss mailing list
> mdb-discuss at opensolaris.org
>
>   


Reply via email to