Tom Chen wrote:
Guys, thanks for your help! Finally I find that this is because one thread
holding mutex is preempted by a high priority timer (soft) interrupt which also
seeks the same mutex. thus dead lock happens.
I did some research and noted that it crashes only when my send( ) function is
called and at the same time 1 second timer function is also invoked. In my
send( ) code, after a packet is prepared and saved in DMA accessible area, it
will write a device register to inform the hardware to read the packet and send
out. 1 second timer also needs to read hardware registers to get current link
status. I use a hardware mutex to control the access to hardware. So, if send(
) got the mutex but preempted by 1 second timer, however, the timer must get
this mutex before it can read/write hardware, timer must wait until mutex
becomes available, since send( ) does not run, timer interrupt will wait
forever and system crash.
If timer( ) does not call mutex_enter( ) and just read/write hardware registers
directly, then, the deadlock does not happen any more. But this solution is not
good. Any other ideas? Do you guys use a mutex lock to control access to PCI
Network Card? Do you have similar issue?
Tom
Oliver, below is my analysis using mdb following your article, what can you see
from it? Sorry, I do not have much experience on this. How do you get the ACT
tool?
panic[cpu0]/thread=[b]ffffff0003eddc80[/b]:
Deadlock: cycle in blocking chain
ffffff0003eddaa0 genunix:turnstile_block+9f3 ()
ffffff0003eddb20 unix:mutex_vector_enter+38d ()
ffffff0003eddb50 qla:qla_link_state_machine+22 ()
ffffff0003eddb70 qla:qla_timer+78 ()
ffffff0003eddbd0 genunix:callout_execute+b1 ()
ffffff0003eddc60 genunix:taskq_thread+1dc ()
ffffff0003eddc70 unix:thread_start+8 ()
It seems the panic thread try to acquire a DRIVER mutex and blocked. You
shouldn't do that, I think you couldn't blocked in callout_execute
context, because it will cause all of threads which depend on callout
table processing are also blocked. For example, the thread in your
driver might use cv_timedwait, which also could be blocked due to
callout table was locked. And at that time, clock interrupt thread also
could be blocked.
If you can't avoid using driver mutex and rwlock in qla_timer related
routine, maybe you can trigger another soft interrupts to handle it.
And ddi_intr_add_softint(9F) should work for you.
I'm not an expert of driver development, maybe other guys can give you
better solution.
[b]ffffff0003eddc80[/b]::findstack -v
stack pointer for thread ffffff0003eddc80: ffffff0003edd540
ffffff0003edd580 0x292()
ffffff0003edd5a0 7()
ffffff0003edd670 xc_common+0x3fb(fffffffffb81c240, 0, fffffffffb861981, ffffff0003edd680, 0, ff,
ffffff0003edd700)
ffffff0003edd700 xc_mbox_lock+0x10()
ffffff0003edd750 xc_wait_sync+0x2b(fffffffec586f180, 0, fffffffffb81bf5a, ffffff0003edd750, ff,
fffffffffb81c240)
ffffff0003edd7f0 x86pte_inval+0x1e3(fffffffec586f180, c1, 8000000002045561, 0)
ffffff0003edd880 hat_pte_unmap+0x21b(1b70e7008975471b, 1, fffffffffbbe8a53,
ffffff0003edd890, 1388)
ffffff0003edd890 1()
ffffff0003edd910 dosoftint_prolog+0xa2(246, 91409, ffffff0003edd920, 0)
ffffff0003edda00 panic+0x9c()
ffffff0003eddaa0 turnstile_block+0x9f3(0, 0, [b]fffffffee1c178f8,[/b]
fffffffffbc04550, 0, 0)
ffffff0003eddb20 mutex_vector_enter+0x38d(fffffffee1c178f8)
ffffff0003eddb50 qla_link_state_machine+0x22()
ffffff0003eddb70 qla_timer+0x78()
ffffff0003eddbd0 callout_execute+0xb1(fffffffec65e8000)
ffffff0003eddc60 taskq_thread+0x1dc(fffffffec57f7698)
ffffff0003eddc70 thread_start+8()
[b]fffffffee1c178f8[/b]::rwlock
ADDR OWNER/COUNT FLAGS WAITERS
fffffffee1c178f8 READERS=23058428717 B001 ffffff0003eddc80 (W)
|
HAS_WAITERS --------+
My blog's example is about system hang with rwlock. From your stack
back trace, you can see your driver thread was acquiring a mutex
instead of rwlock, so you should use ::mutex instead of ::rwlock. If
you check address fffffffee1c178f8 with ::mutex dcmd, you might find the
onwer of the mutex. Then you can ::findstack against that owner.
Anyway, you can find that deadlock chain by the similar steps.
ACT should be available on sunsolve.sun.com, and it can find the
deadlock threads automatically, specially, it should work for your case.
But I'm not sure whether it's a free download. I also heard from other
guys, there is another automation tools named SCAT for download.
--
Cheers,
----------------------------------------------------------------------
Oliver Yang | [EMAIL PROTECTED] | x82229 | Work from office
_______________________________________________
networking-discuss mailing list
[email protected]