Hi,

> I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

I agree -- this small allocation is unlikely to fail in practice. Also, while my patch prevents the kernel oops, it also prevents the debugfs entries from being created.

I've been debugging this more trying to come up with a better solution, but I might need some guidance as I'm not too familiar with the USB subsystem. The immediate cause of the crash was usbmuxd sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn calls xhci_debugfs_create_endpoint. The ioctl handler acquires a device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This in turn called xhci_mem_cleanup to deallocate the device structures, which include setting the debugfs_private pointer to NULL (via xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl is somehow racing with the hibernate. The call to xhci_resume is protected by a host-controller specific lock (xhci->lock) but it doesn't attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device structures while racing with the ioctl handler. I can't seem to find any exclusion mechanism that would prevent xhci_resume from racing with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). Am I missing something? If not, is there any reason why an ioctl might need to execute in parallel with the xhci_resume? If not, can we just do a busy wait in xhci_resume until all pending ioctls have returned?

> (Added this as reply subject, please remember to send mail with proper subject)
My apologies, thanks for fixing.


On 12/07/2017 04:47 AM, Mathias Nyman wrote:
On 07.12.2017 11:26, Alexander Kappner wrote:
Date: Wed, 6 Dec 2017 15:28:37 -0800
Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

(Added this as reply subject, please remember to send mail with proper subject)


My kernel crashed just after resuming from hibernate and starting usbmuxd
(a user-space daemon for iOS device pairing) with several USB devices
connected (dmesg attached).

Backtrace leads to:

0xffffffff8170465d is in xhci_debugfs_create_endpoint
(drivers/usb/host/xhci-debugfs.c:381).
376                                       int ep_index)
377     {
378             struct xhci_ep_priv     *epriv;
379             struct xhci_slot_priv   *spriv = dev->debugfs_private;
380
381             if (spriv->eps[ep_index])
382                     return;
383
384             epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
385             if (!epriv)

The read violation happens at address 0x40 and sizeof(struct
xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.


Thanks for reporting and debugging this, much appreciated.

spriv gets allocated in xhci_debugfs_create_slot:

...
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
     return;
...

There's no separate error path if this allocation fails, so we might be
left with NULL in priv. Subsequent users of priv thus need to check for
this NULL - so this is what the patch does.


I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

Resume from hibernation will reset the xhci controller, reinitializing a lot.
It could be related.

-Mathias

(keeping rest of message as reference)

There might be other ways of triggering this null pointer dereference,
including when xhci_resume frees the device structures (e.g. after
returning from a hibernate), but I wasn't able to find or reproduce it.

[63953.758083] BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
[63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
[63953.758093] Oops: 0000 [#1] PREEMPT SMP
[63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
qmi_wwan ecdh_generic thinkpad_acpi rfkill
[63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P           O
4.14.0.1-12769-g1deab8c #1
[63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
(1.35 ) 11/10/2016
[63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
[63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
[63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
0000000000030000
[63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
ffff88105a71c238
[63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
0000000000000003
[63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
0000000000000011
[63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
0000000000000000
[63953.758113] FS:  00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
knlGS:0000000000000000
[63953.758114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
00000000003606e0
[63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[63953.758117] Call Trace:
[63953.758120]  xhci_add_endpoint+0x127/0x2b0
[63953.758123]  usb_hcd_alloc_bandwidth+0x1ad/0x300
[63953.758125]  usb_set_configuration+0x1c8/0x880
[63953.758128]  usbdev_do_ioctl+0xc41/0x1120
[63953.758130]  usbdev_ioctl+0xa/0x10
[63953.758151]  do_vfs_ioctl+0x8b/0x5c0
[63953.758153]  ? __fget+0x6c/0xb0
[63953.758155]  SyS_ioctl+0x76/0x90
[63953.758157]  do_syscall_64+0x6b/0x290
[63953.758173]  entry_SYSCALL64_slow_path+0x25/0x25
[63953.758175] RIP: 0033:0x7f0a76a151c7
[63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
00007f0a76a151c7
[63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
000000000000000e
[63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
0000000000000004
[63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
00000000023240c0
[63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
0000000000000038
[63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
<49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
[63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
ffffc9000a8efc80
[63953.758203] CR2: 0000000000000040
[63953.758204] ---[ end trace 1f7ea9a959f02054 ]---

Signed-off-by: Alexander Kappner <a...@godking.net>
---
  drivers/usb/host/xhci-debugfs.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index 4f7895d..1cea59c 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
      struct xhci_ep_priv    *epriv;
      struct xhci_slot_priv    *spriv = dev->debugfs_private;
  +    if (!spriv)
+        return;
+
      if (spriv->eps[ep_index])
          return;


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to