Hi Jason,

Thanks for sharing the core dumps.

I've root caused this issue as a deadlock issue. It's caused by using
spin lock for cv_timedwait_sig()/cv_signal() calls in
rtSemEventMultiWait()/RTSemEventSignal() in vboxdrv module of virtual
box. You may want to try latest vbox to see if this issue is still there
or not. See analysis below for details:


> ::cpuinfo -v
ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC
  0 fffffffffbc36f80  1b    1    2  -1   no    no t-5    ffffff0007805c60
 (idle)
                       |    |    |
            RUNNING <--+    |    +--> PIL THREAD
              READY         |           2 ffffff00078cbc60
             EXISTS         |
             ENABLE         +-->  PRI THREAD           PROC
                                   60 ffffff0007e11c60 sched

ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 ffffff01cbee7000 1f 0 0 168 no no t-6968 ffffff0007c49c60 sched
                       |         |
            RUNNING <--+         +--> PIL THREAD
              READY                     9 ffffff0007c49c60
           QUIESCED                     6 ffffff0007c4fc60
             EXISTS                     - ffffff01d6d55740 VirtualBox
             ENABLE


Here we see ffffff01d6d55740 is preempted by e1000g interrupt(IPL 6),
then e1000g interrupt is preempted by nvidia graphics interrupt.

> ffffff0007c49c60::findstack -v
stack pointer for thread ffffff0007c49c60: ffffff0007c49b30
  ffffff0007c49bb0 rm_isr+0xaa()
  ffffff0007c49c00 av_dispatch_autovect+0x7c(10)
  ffffff0007c49c40 dispatch_hardint+0x33(10, 6)
  ffffff0007c4f450 switch_sp_and_call+0x13()
  ffffff0007c4f4a0 do_interrupt+0x9e(ffffff0007c4f4b0, b)
  ffffff0007c4f4b0 _interrupt+0xba()
  ffffff0007c4f5c0 default_lock_delay+0x8c(4)
ffffff0007c4f630 lock_set_spl_spin+0xc2(ffffff01e4840d99, b, ffffff01e4840d9c
  , 6)
  ffffff0007c4f690 mutex_vector_enter+0x45e(ffffff01e4840d98)
  ffffff0007c4f6c0 RTSemEventSignal+0x6a()
  ffffff0007c4f740 0xfffffffff836c57b()
  ffffff0007c4f770 0xfffffffff836d73a()
  ffffff0007c4f830 vboxNetFltSolarisRecv+0x331()
  ffffff0007c4f880 VBoxNetFltSolarisModReadPut+0x107()
  ffffff0007c4f8f0 putnext+0x21e(ffffff01e425ed30, ffffff01ccb1a0a0)
ffffff0007c4f950 dld_str_rx_raw+0xb3(ffffff01ce74a6c8, 0, ffffff01ccb1a0a0,
  ffffff0007c4f980)
ffffff0007c4fa10 dls_rx_promisc+0x179(ffffff01ce74a6c8, 0, ffffff01ccb1a0a0, 0
  )
  ffffff0007c4fa50 mac_promisc_dispatch_one+0x5f(ffffff01df15ee78,
  ffffff01d20f7640, 0)
ffffff0007c4fac0 mac_promisc_dispatch+0x105(ffffff01cdc5f788, ffffff01d20f7640
  , 0)
  ffffff0007c4fb10 mac_rx+0x3e(ffffff01cdc5f788, ffffff01cdb86ee8,
  ffffff01d20f7640)
  ffffff0007c4fb50 mac_rx_ring+0x4c(ffffff01cdc5f788, ffffff01cdb86ee8,
  ffffff01d20f7640, 1)
  ffffff0007c4fbb0 e1000g_intr+0x17e(ffffff01cd3ec000)
  ffffff0007c4fc00 av_dispatch_autovect+0x7c(30)
  ffffff0007c4fc40 dispatch_hardint+0x33(30, 0)
  ffffff0008b8e7d0 switch_sp_and_call+0x13()
  ffffff0008b8e820 do_interrupt+0x9e(ffffff0008b8e830, 0)
  ffffff0008b8e830 _interrupt+0xba()
  ffffff0008b8e960 untimeout_generic+0xaf(17e3e41c8, 0)
  ffffff0008b8e980 untimeout_default+0x15(17e3e41c8, 0)
  ffffff0008b8ea00 cv_timedwait_sig_internal+0x1f8(ffffff01dd2a2910,
  ffffff01dd2a2908, 553c76, 0)
ffffff0008b8ea30 cv_timedwait_sig+0x1f(ffffff01dd2a2910, ffffff01dd2a2908,
  553c76)
  ffffff0008b8ea90 rtSemEventMultiWait+0xe2()
  ffffff0008b8eae0 0xfffffffff8333a80()
  ffffff0008b8eb10 0xfffffffff833e60c()
  ffffff0008b8eb50 0xfffffffff833edeb()
  ffffff0008b8ec30 supdrvIOCtl+0xd62()
  ffffff0008b8ed00 VBoxDrvSolarisIOCtl+0x359()
ffffff0008b8ed40 cdev_ioctl+0x45(10a00000000, c0185687, fffffd7ffe55cda0,
  202003, ffffff01dd4364b0, ffffff0008b8ee24)
ffffff0008b8ed80 spec_ioctl+0x83(ffffff01de3ca200, c0185687, fffffd7ffe55cda0
  , 202003, ffffff01dd4364b0, ffffff0008b8ee24, 0)
ffffff0008b8ee00 fop_ioctl+0x7b(ffffff01de3ca200, c0185687, fffffd7ffe55cda0,
  202003, ffffff01dd4364b0, ffffff0008b8ee24, 0)
  ffffff0008b8ef00 ioctl+0x18e(17, ffffffffc0185687, fffffd7ffe55cda0)
  ffffff0008b8ef10 sys_syscall+0x17b()
>


Anyway, e1000g interrupt(ffffff01d6d55740) is spinning on a spin lock:

> ffffff01e4840d98::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
ffffff01e4840d98  spin              yes     11      0       -


The question now is who is holding the spin lock? Let's print out
all the threads related to vboxdrv:

> ::stacks -a -m vboxdrv
THREAD           STATE    SOBJTYPE            COUNT
ffffff01ccad6e60 SLEEP    MUTEX                   1
                 swtch+0x147
                 turnstile_block+0x764
                 mutex_vector_enter+0x261
                 timeout_generic+0x87
                 cv_timedwait_sig_internal+0xef
                 cv_timedwait_sig+0x1f
                 rtSemEventWait+0xe2
                 0xfffffffff836d354
                 0xfffffffff833e6ae
                 0xfffffffff833edeb
                 supdrvIOCtl+0xd62
                 VBoxDrvSolarisIOCtl+0x359
                 cdev_ioctl+0x45
                 spec_ioctl+0x83
                 fop_ioctl+0x7b
                 ioctl+0x18e

THREAD           STATE    SOBJTYPE            COUNT
ffffff0007c4fc60 ONPROC   <NONE>                  1
                 default_lock_delay+0x8c
                 lock_set_spl_spin+0xc2
                 mutex_vector_enter+0x45e
                 RTSemEventSignal+0x6a
                 0xfffffffff836c57b
                 0xfffffffff836d73a
                 vboxNetFltSolarisRecv+0x331
                 VBoxNetFltSolarisModReadPut+0x107
                 putnext+0x21e
                 dld_str_rx_raw+0xb3
                 dls_rx_promisc+0x179
                 mac_promisc_dispatch_one+0x5f
                 mac_promisc_dispatch+0x105
                 mac_rx+0x3e
                 mac_rx_ring+0x4c
                 e1000g_intr+0x17e
                 av_dispatch_autovect+0x7c
                 dispatch_hardint+0x33

THREAD           STATE    SOBJTYPE            COUNT
ffffff01d6d55740 ONPROC   <NONE>                  1
                 untimeout_generic+0xaf
                 untimeout_default+0x15
                 cv_timedwait_sig_internal+0x1f8
                 cv_timedwait_sig+0x1f
                 rtSemEventMultiWait+0xe2
                 0xfffffffff8333a80
                 0xfffffffff833e60c
                 0xfffffffff833edeb
                 supdrvIOCtl+0xd62
                 VBoxDrvSolarisIOCtl+0x359
                 cdev_ioctl+0x45
                 spec_ioctl+0x83
                 fop_ioctl+0x7b
                 ioctl+0x18e

Now we can see the deadlock senario is:

- Thread A(ffffff01d6d55740) is cancelling a cv timeout with the callout
  table lock(at untimeout_generic+0x54) being hold.

- Thread B(ffffff01ccad6e60) is blocked and sleep trying to get the
  callout table lock to enable a timeout. However, it has hold the
  vboxdrv spin lock(at rtSemEventMultiWait+0x8b) and hasn't got chance
  to release it yet at (cv_timedwait_sig_internal+0x138).

- E1000g interrup(thread ffffff0007c4fc60) preempted thread A and calls
  RTSemEventSignal() which needs to hold the vboxdrv spin lock, but the
  spin lock is held at thread B. This cause e1000g interrupt thread
  spins there foreever on CPU1. (Only intterrupts with higher priority
  than e1000g, i.e. >6, can get scheduled on CPU1)

To solve this issue, vboxdrv needs to change to not use spin lock but
adaptive lock for cv_timedwait_sig()/cv_signal() calls in
rtSemEventMultiWait()/RTSemEventSignal().

On the other hand, can we also improve Solaris cv_* interface to avoid
such issue? I CCed Eric and Madhavan for comments.

Regards,
Kerry

Jason King wrote:
On Thu, Sep 17, 2009 at 11:55 AM, Dana H. Myers <[email protected]> wrote:
Kerry Shu wrote:
What you are looking for is 0x10, not 0x0a. Looks to me, here you have
IRQ# 16 interrupt (might be either hci1394#0, uhci#3, uhci#0, or
nvidia#0) preempting e1000g#0 interrupt. I guess such situation happened
frequently since you felt system freeze. So are you running something
that let both e1000g0 and other 4 driver instances at IRQ# 16 busy? For
example, are you putting heavy load on both network and graphics?
I don't understand what the VirtualBox/bridged-Ethernet configuration
does here, but one comment about xHCIs - they tend to interrupt
regularly at a reasonable rate, IIRC, even without a heavy
load.  Graphics interrupts periodically but not that often, I
believe tied to vertical retrace/frame rate, which is 50-100 per
second.


Could it be possible that it's stuck in in the e1000g ISR and the IRQ
16 preemption is just a red herring?

I'm not really familiar with the Opensolaris interrupt implementation,
so I'm making some assumptions here (corrections are welcome) ---
looking at the stack, I'm guessing the preemption by IRQ 16 starts at
the _interrupt call.  Since xHCIs interrupt regularly, if the e1000g
ISR (or really the vboxfilt stream module sitting ontop of e1000g) is
deadlocked on a mutex, isn't that what you'd see?  I'm assuming the
mutex call prior to _interrupt isn't part of the preemption process.
_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to