[clearview-discuss] three-way deadlock with softmac/ce

Peter Memishian Sat, 16 Feb 2008 19:59:24 -0500

During IPMP testing, I hit an interesting deadlock between softmac/GLDv3
and ce.  Thread 1 grabbed di_lock as RW_WRITER (via dls_multicst_remove()),
sent a DL_DISABMULTI_REQ downstream, and is blocked waiting for an ACK:


  stack pointer for thread 2a10046fca0: 2a10046eda1
  [ 000002a10046eda1 cv_timedwait+0x8c() ]
    000002a10046ee51 softmac_output+0x80()
    000002a10046ef01 mac_multicst_remove+0xc4()
    000002a10046efb1 dls_multicst_remove+0x60()
    000002a10046f061 proto_disabmulti_req+0xbc()
    000002a10046f111 dld_wput_nondata_task+0xf0()
    000002a10046f1c1 taskq_d_thread+0xbc()
    000002a10046f291 thread_start+4()

Thread 2 is an interrupt that happened to come in after thread 1 grabbed
di_lock but before the DL_DISABMULTI_REQ was handled by ce.  Inside the
ce_intr() logic, it grabbed a lock as RW_READER and called putnext().
It's blocked in dls_accept() trying to acquire di_lock as RW_READER:

  stack pointer for thread 2a10007fca0: 2a10007e191
  [ 000002a10007e191 turnstile_block+0x5a4() ]
    000002a10007e241 rw_enter_sleep+0x168()
    000002a10007e2f1 dls_accept+0x1c()
    000002a10007e3a1 i_dls_link_rx+0x260()
    000002a10007e4d1 mac_do_rx+0xb0()
    000002a10007e581 putnext+0x3f4()    
    000002a10007e631 ce_intr+0x1a8c()
    000002a10007f1d1 pci_intr_wrapper+0xe8()
    000002a10007f291 intr_thread+0x2b8()

Thread 3 is the taskq handling the DL_DISABMULTI_REQ.  It's trying to
acquire the aforementioned ce lock as RW_WRITER, but is blocked because
thread 2 holds it as RW_READER:

  stack pointer for thread 2a100157ca0: 2a100156691
  [ 000002a100156691 turnstile_block+0x5a4() ]
    000002a100156741 rw_enter_sleep+0x1b0()
    000002a1001567f1 ce_dmreq+0xc8()
    000002a1001568b1 ce_proto+0x1d8()
    000002a100156961 ce_wsrv+0x2d30()
    000002a100157061 runservice+0x6c()
    000002a100157111 stream_service+0x190()
    000002a1001571c1 taskq_d_thread+0xbc()
    000002a100157291 thread_start+4()

So, T1 is blocked waiting for T3, T3 is blocked waiting for T2, and T2 is
blocked waiting for T1.  Seems like the right fix is to change ce not to
hold a lock across putnext(), but that may be a high-risk change and there
may be other legacy drivers that have a similar flaw.  So I'm interested
to hear from Thiru on whether his new GLDv3 locking design would also
resolve this deadlock.

-- 
meem

[clearview-discuss] three-way deadlock with softmac/ce

Reply via email to