Hi, I've been tracking down two issues and one of them seems to be a problem with either usbcore or xhci.
DWC3, when acting as host, instantiates an xhci platform-device and sets
itself as the parent of that. That's all fine and dandy until I try to
modprobe -r dwc3.ko which causes XHCI to hang:
| # lsmod
| Module Size Used by
| xhci_hcd 116180 0
| dwc3 46765 0
| udc_core 10472 1 dwc3
| dwc3_omap 5402 0
| matrix_keypad 7218 0
| lis3lv02d_i2c 3718 0
| lis3lv02d 16439 1 lis3lv02d_i2c
| input_polldev 5315 1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
| Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. 4-Port HUB
| Bus 001 Device 003: ID 0403:6001 Future Technology Devices International, Ltd
FT232 USB-Serial (UART) IC
| Bus 001 Device 002: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r dwc3
| [ 53.016798] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [ 53.023083] usb usb2: USB disconnect, device number 1
| [ 53.082845] xhci-hcd xhci-hcd.0.auto: Host not halted after 16000
microseconds.
| [ 53.090732] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [ 53.112511] xhci-hcd xhci-hcd.0.auto: remove, state 1
| [ 53.117883] usb usb1: USB disconnect, device number 1
| [ 53.123301] usb 1-1: USB disconnect, device number 2
| [ 53.128503] usb 1-1.6: USB disconnect, device number 3
| [ 90.539781] INFO: task modprobe:1792 blocked for more than 30 seconds.
| [ 90.546607] Not tainted 3.17.0-rc2-00004-ge0b64425 #800
| [ 90.552672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
| [ 90.560855] modprobe D c06bf5a0 0 1792 1662 0x00000000
| [ 90.567541] [<c06bf5a0>] (__schedule) from [<c06bfa94>]
(schedule+0x40/0x8c)
| [ 90.574925] [<c06bfa94>] (schedule) from [<c06c3e48>]
(schedule_timeout+0x154/0x220)
| [ 90.583031] [<c06c3e48>] (schedule_timeout) from [<c06c0554>]
(wait_for_common+0xdc/0x178)
| [ 90.591672] [<c06c0554>] (wait_for_common) from [<c06c0610>]
(wait_for_completion+0x20/0x24)
| [ 90.600537] [<c06c0610>] (wait_for_completion) from [<bf0569d4>]
(xhci_configure_endpoint+0xc8/0x590 [xhci_hcd])
| [ 90.611226] [<bf0569d4>] (xhci_configure_endpoint [xhci_hcd]) from
[<bf057664>] (xhci_check_bandwidth+0x16c/0x294 [xhci_hcd])
| [ 90.623100] [<bf057664>] (xhci_check_bandwidth [xhci_hcd]) from
[<c04e5578>] (usb_hcd_alloc_bandwidth+0x1dc/0x320)
| [ 90.633938] [<c04e5578>] (usb_hcd_alloc_bandwidth) from [<c04e8160>]
(usb_disable_device+0x198/0x1f8)
| [ 90.643586] [<c04e8160>] (usb_disable_device) from [<c04df3fc>]
(usb_disconnect+0x7c/0x224)
| [ 90.652323] [<c04df3fc>] (usb_disconnect) from [<c04df54c>]
(usb_disconnect+0x1cc/0x224)
| [ 90.660778] 8 locks held by modprobe/1792:
| [ 90.665055] #0: (&dev->mutex){......}, at: [<c0439c04>]
driver_detach+0x54/0xc8
| [ 90.672929] #1: (&dev->mutex){......}, at: [<c0439c10>]
driver_detach+0x60/0xc8
| [ 90.680798] #2: (&dev->mutex){......}, at: [<c0439524>]
device_release_driver+0x28/0x3c
| [ 90.689373] #3: (usb_bus_list_lock){+.+.+.}, at: [<c04e4e04>]
usb_remove_hcd+0xa0/0x1b4
| [ 90.697971] #4: (&dev->mutex){......}, at: [<c04df3d0>]
usb_disconnect+0x50/0x224
| [ 90.706022] #5: (&dev->mutex){......}, at: [<c04df3d0>]
usb_disconnect+0x50/0x224
| [ 90.714069] #6: (&dev->mutex){......}, at: [<c04df3d0>]
usb_disconnect+0x50/0x224
| [ 90.722109] #7: (hcd->bandwidth_mutex){+.+.+.}, at: [<c04e814c>]
usb_disable_device+0x184/0x1f8
This only happens when I have devices attached to the XHCI port on my
platform (AM437x, but I suppose any XHCI would die similarly if you can
destroy the underlying {platform,pci}_device.
If I first remove xhci then remove dwc3, it works fine:
| # lsmod
| Module Size Used by
| xhci_hcd 116180 0
| dwc3 46765 0
| udc_core 10472 1 dwc3
| matrix_keypad 7218 0
| dwc3_omap 5402 0
| lis3lv02d_i2c 3718 0
| lis3lv02d 16439 1 lis3lv02d_i2c
| input_polldev 5315 1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 005: ID 0b95:7720 ASIX Electronics Corp. AX88772
| Bus 001 Device 004: ID 1a40:0101 Terminus Technology Inc. 4-Port HUB
| Bus 001 Device 003: ID 0403:6001 Future Technology Devices International, Ltd
FT232 USB-Serial (UART) IC
| Bus 001 Device 002: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r xhci-hcd
| [ 38.895745] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [ 38.902034] usb usb2: USB disconnect, device number 1
| [ 38.933439] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [ 38.945408] xhci-hcd xhci-hcd.0.auto: remove, state 1
| [ 38.950968] usb usb1: USB disconnect, device number 1
| [ 38.956280] usb 1-1: USB disconnect, device number 2
| [ 38.961563] usb 1-1.6: USB disconnect, device number 3
| [ 38.980267] usb 1-1.7: USB disconnect, device number 4
| [ 38.985710] usb 1-1.7.4: USB disconnect, device number 5
| [ 38.994068] asix 1-1.7.4:1.0 eth1: unregister 'asix'
usb-xhci-hcd.0.auto-1.7.4, ASIX AX88772 USB 2.0 Ethernet
| [ 39.122913] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
| # modprobe -r dwc3
| #
It also works fine I don't have anything attached to the XHCI port:
| # lsmod
| Module Size Used by
| xhci_hcd 116180 0
| dwc3 46765 0
| udc_core 10472 1 dwc3
| matrix_keypad 7218 0
| dwc3_omap 5402 0
| lis3lv02d_i2c 3718 0
| lis3lv02d 16439 1 lis3lv02d_i2c
| input_polldev 5315 1 lis3lv02d
| # lsusb
| Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
| Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
| # modprobe -r dwc3
| [ 63.910052] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [ 63.915429] usb usb2: USB disconnect, device number 1
| [ 63.959522] xhci-hcd xhci-hcd.0.auto: Host not halted after 16000
microseconds.
| [ 63.967461] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
| [ 63.981720] xhci-hcd xhci-hcd.0.auto: remove, state 4
| [ 63.987160] usb usb1: USB disconnect, device number 1
| [ 64.006709] xhci-hcd xhci-hcd.0.auto: USB bus 1 deregistered
if you want to know, this is running v3.17-rc2 but I know that at least
v3.14 also exibits the same problem. Any suggestions on how to get this
thing sorted out ? I'm pretty much running out of ideas :-s
The second problem I have is exposed because I reverted commit c5a1fbc
(usb: dwc3: dwc3-omap: Fix the crash on module removal) because that fix
is wrong, it had a side effect of modprobe -r dwc3-omap *NOT* destroying
the platform_device for dwc3.ko which wouldn't cause dwc3.ko to unprobed
and its resources would not be destroyed.
I traced this one down to __release_resource() getting a NULL pointer
dereference when grabbing a pointer to old->parent->child, but I can't
seem to figure out exactly what is wrong there. It doesn't seem, to me,
that old->parent or old->parent->child should ever be NULL... Any ideas?
| # modprobe -r dwc3-omap
| [ 539.835401] Unable to handle kernel NULL pointer dereference at virtual
address 00000018
| [ 539.844043] pgd = eb83c000
| [ 539.846893] [00000018] *pgd=00000000
| [ 539.850734] Internal error: Oops: 5 [#1] SMP ARM
| [ 539.855588] Modules linked in: xhci_hcd matrix_keypad dwc3_omap(-)
lis3lv02d_i2c lis3lv02d input_polldev [last unloaded: udc_core]
| [ 539.867977] CPU: 0 PID: 1878 Comm: modprobe Not tainted
3.17.0-rc2-00004-ge0b64425 #800
| [ 539.876384] task: ed0d4040 ti: ed07c000 task.ti: ed07c000
| [ 539.882076] PC is at release_resource+0x24/0x90
| [ 539.886847] LR is at lock_acquired+0x280/0x3b8
| [ 539.891509] pc : [<c004eba8>] lr : [<c0091f8c>] psr: 60000013
| [ 539.891509] sp : ed07ddf0 ip : ed07dd80 fp : ed07de04
| [ 539.903570] r10: 00000000 r9 : ed07c000 r8 : c000f064
| [ 539.909061] r7 : 00000081 r6 : c0577eec r5 : ed564c00 r4 : eb97da80
| [ 539.915900] r3 : 00000000 r2 : 00000000 r1 : 60000013 r0 : c004eba4
| [ 539.922740] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment
user
| [ 539.930238] Control: 10c5387d Table: ab83c059 DAC: 00000015
| [ 539.936274] Process modprobe (pid: 1878, stack limit = 0xed07c248)
| [ 539.942751] Stack: (0xed07ddf0 to 0xed07e000)
| [ 539.947324] dde0: 00000001 ed564c00
ed07de1c ed07de08
| [ 539.955897] de00: c043b670 c004eb90 ed564c00 00000000 ed07de34 ed07de20
c043b6bc c043b600
| [ 539.964476] de20: c0c55528 ed564c10 ed07de4c ed07de38 c0577f78 c043b6ac
ed564c10 00000000
| [ 539.973065] de40: ed07de74 ed07de50 c0435ac4 c0577ef8 ed20eb40 ed487578
ed07de84 ed20f410
| [ 539.981649] de60: ed210010 ed210044 ed07de84 ed07de78 c0577ee4 c0435a7c
ed07de9c ed07de88
| [ 539.990206] de80: bf013310 c0577ed0 ed210010 bf013e6c ed07deac ed07dea0
c043b010 bf0132c4
| [ 539.998764] dea0: ed07dec4 ed07deb0 c04394a8 c043aff4 ed210010 bf013e6c
ed07dee4 ed07dec8
| [ 540.007339] dec0: c0439c74 c0439434 ed0d4040 bf013e6c 00000000 00000800
ed07defc ed07dee8
| [ 540.015914] dee0: c04391a4 c0439bbc bf01391c bf013e6c ed07df14 ed07df00
c043a4e4 c0439154
| [ 540.024498] df00: bf01391c bf013eb0 ed07df24 ed07df18 c043b7c4 c043a4b8
ed07df34 ed07df28
| [ 540.033082] df20: bf013930 c043b7b4 ed07dfa4 ed07df38 c00cab3c bf013928
ed07df54 00000000
| [ 540.041650] df40: bf013eb0 00000800 ed07df3c 33637764 616d6f5f 00000070
ed07df84 ed07df68
| [ 540.050246] df60: c00906a4 c00904ec b7007220 b7007254 00000000 00000081
ed07df94 ed07df88
| [ 540.058818] df80: c00907fc 00090584 00000000 b7007220 b7007254 00000000
00000000 ed07dfa8
| [ 540.067412] dfa0: c000ede0 c00caa28 b7007220 b7007254 b7007254 00000800
b7006000 000254b8
| [ 540.076003] dfc0: b7007220 b7007254 00000000 00000081 b7007254 00000001
b7007008 b70072b0
| [ 540.084595] dfe0: b6f31420 be99b76c b6feff98 b6f3142c 60000010 b7007254
ed064e2b 50b60016
| [ 540.093219] [<c004eba8>] (release_resource) from [<c043b670>]
(platform_device_del+0x7c/0xac)
| [ 540.102181] [<c043b670>] (platform_device_del) from [<c043b6bc>]
(platform_device_unregister+0x1c/0x30)
| [ 540.112048] [<c043b6bc>] (platform_device_unregister) from [<c0577f78>]
(of_platform_device_destroy+0x8c/0x98)
| [ 540.122557] [<c0577f78>] (of_platform_device_destroy) from [<c0435ac4>]
(device_for_each_child+0x54/0x80)
| [ 540.132612] [<c0435ac4>] (device_for_each_child) from [<c0577ee4>]
(of_platform_depopulate+0x20/0x28)
| [ 540.142312] [<c0577ee4>] (of_platform_depopulate) from [<bf013310>]
(dwc3_omap_remove+0x58/0x78 [dwc3_omap])
| [ 540.152634] [<bf013310>] (dwc3_omap_remove [dwc3_omap]) from [<c043b010>]
(platform_drv_remove+0x28/0x2c)
| [ 540.162665] [<c043b010>] (platform_drv_remove) from [<c04394a8>]
(__device_release_driver+0x80/0xd4)
| [ 540.172233] [<c04394a8>] (__device_release_driver) from [<c0439c74>]
(driver_detach+0xc4/0xc8)
| [ 540.181251] [<c0439c74>] (driver_detach) from [<c04391a4>]
(bus_remove_driver+0x5c/0xb0)
| [ 540.189750] [<c04391a4>] (bus_remove_driver) from [<c043a4e4>]
(driver_unregister+0x38/0x58)
| [ 540.198601] [<c043a4e4>] (driver_unregister) from [<c043b7c4>]
(platform_driver_unregister+0x1c/0x20)
| [ 540.208274] [<c043b7c4>] (platform_driver_unregister) from [<bf013930>]
(dwc3_omap_driver_exit+0x14/0x1c [dwc3_omap])
| [ 540.219407] [<bf013930>] (dwc3_omap_driver_exit [dwc3_omap]) from
[<c00cab3c>] (SyS_delete_module+0x120/0x1b0)
| [ 540.229943] [<c00cab3c>] (SyS_delete_module) from [<c000ede0>]
(ret_fast_syscall+0x0/0x48)
| [ 540.238617] Code: e1a04000 e59f006c eb19da12 e5943010 (e5932018)
| [ 540.245128] ---[ end trace ee0e6e3f9c9ba6ac ]---
| [ 540.249985] note: modprobe[1878] exited with preempt_count 1
| Segmentation fault
| #
FYI, PC dies at line 241 on kernel/resource.c:
| (gdb) l *(release_resource + 0x24)
| 0xc004eba8 is in release_resource (kernel/resource.c:241).
| 236 {
| 237 struct resource *tmp, **p;
| 238
| 239 p = &old->parent->child;
| 240 for (;;) {
| 241 tmp = *p;
| 242 if (!tmp)
| 243 break;
| 244 if (tmp == old) {
| 245 *p = tmp->sibling;
Based on that, either old->parent or old->parent->child is NULL. But
considering that that virtual address is 0x18 (24 bytes offset) that
would be, if I can calculate correctly, the child offset inside parent.
So parent is NULL and NULL->child = 0x18.
cheers
--
balbi
signature.asc
Description: Digital signature
