This bug was fixed in the package linux - 4.4.0-42.62

---------------
linux (4.4.0-42.62) xenial; urgency=low

  * Fix GRO recursion overflow for tunneling protocols (LP: #1631287)
    - tunnels: Don't apply GRO to multiple layers of encapsulation.
    - gro: Allow tunnel stacking in the case of FOU/GUE

  * CVE-2016-7039
    - SAUCE: net: add recursion limit to GRO

linux (4.4.0-41.61) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1628204

  * nvme drive probe failure (LP: #1626894)
    - (fix) NVMe: Don't unmap controller registers on reset

linux (4.4.0-40.60) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1627074

  * Permission denied in CIFS with kernel 4.4.0-38 (LP: #1626112)
    - Fix memory leaks in cifs_do_mount()
    - Compare prepaths when comparing superblocks
    - SAUCE: Fix regression which breaks DFS mounting

  * Backlight does not change when adjust it higher than 50% after S3
    (LP: #1625932)
    - SAUCE: i915_bpo: drm/i915/backlight: setup and cache pwm alternate
      increment value
    - SAUCE: i915_bpo: drm/i915/backlight: setup backlight pwm alternate
      increment on backlight enable

linux (4.4.0-39.59) xenial; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1625303

  * thunder: chip errata w/ multiple CQEs for a TSO packet (LP: #1624569)
    - net: thunderx: Fix for issues with multiple CQEs posted for a TSO packet

  * thunder: faulty TSO padding (LP: #1623627)
    - net: thunderx: Fix for HW issue while padding TSO packet

  * CVE-2016-6828
    - tcp: fix use after free in tcp_xmit_retransmit_queue()

  * Sennheiser Officerunner - cannot get freq at ep 0x83 (LP: #1622763)
    - SAUCE: (no-up) ALSA: usb-audio: Add quirk for sennheiser officerunner

  * Backport E3 Skylake Support in ie31200_edac to Xenial (LP: #1619766)
    - EDAC, ie31200_edac: Add Skylake support

  * Ubuntu 16.04 - Full EEH Recovery Support for NVMe devices (LP: #1602724)
    - SAUCE: nvme: Don't suspend admin queue that wasn't created

  * ISST-LTE:pNV: system ben is hung during ST (nvme) (LP: #1620317)
    - blk-mq: Allow timeouts to run while queue is freezing
    - blk-mq: improve warning for running a queue on the wrong CPU
    - blk-mq: don't overwrite rq->mq_ctx

  * lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error)
    (LP: #1619918)
    - btrfs: bugfix: handle FS_IOC32_{GETFLAGS, SETFLAGS, GETVERSION} in
      btrfs_ioctl

  * radeon: monitor connected to onboard VGA doesn't work with Xenial
    (LP: #1600092)
    - drm/radeon/dp: add back special handling for NUTMEG

  * initramfs includes qle driver, but not firmware (LP: #1623187)
    - qed: add MODULE_FIRMWARE()

  * [Hyper-V] Rebase Hyper-V to 4.7.2 (stable) (LP: #1616677)
    - hv_netvsc: Implement support for VF drivers on Hyper-V
    - hv_netvsc: Fix the list processing for network change event
    - Drivers: hv: vmbus: Introduce functions for estimating room in the ring
      buffer
    - Drivers: hv: vmbus: Use READ_ONCE() to read variables that are volatile
    - Drivers: hv: vmbus: Export the vmbus_set_event() API
    - lcoking/barriers, arch: Use smp barriers in smp_store_release()
    - asm-generic: guard smp_store_release/load_acquire
    - x86: reuse asm-generic/barrier.h
    - asm-generic: add __smp_xxx wrappers
    - x86: define __smp_xxx
    - asm-generic: implement virt_xxx memory barriers
    - Drivers: hv: vmbus: Move some ring buffer functions to hyperv.h
    - Drivers: hv: vmbus: Implement APIs to support "in place" consumption of
      vmbus packets
    - drivers:hv: Lock access to hyperv_mmio resource tree
    - drivers:hv: Make a function to free mmio regions through vmbus
    - drivers:hv: Track allocations of children of hv_vmbus in private resource
      tree
    - drivers:hv: Separate out frame buffer logic when picking MMIO range
    - Drivers: hv: vmbus: handle various crash scenarios
    - Drivers: hv: balloon: don't crash when memory is added in non-sorted order
    - Drivers: hv: balloon: reset host_specified_ha_region
    - tools: hv: lsvmbus: add pci pass-through UUID
    - hv_netvsc: move start_remove flag to net_device_context
    - hv_netvsc: use start_remove flag to protect netvsc_link_change()
    - hv_netvsc: untangle the pointer mess
    - hv_netvsc: get rid of struct net_device pointer in struct netvsc_device
    - hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with
      netvsc_remove()
    - hv_netvsc: set nvdev link after populating chn_table
    - hv_netvsc: Fix VF register on vlan devices
    - hv_netvsc: remove redundant assignment in netvsc_recv_callback()
    - hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers
    - hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}()
    - hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()
    - hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()
    - netvsc: get rid of completion timeouts
    - PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()
    - PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()
    - netvsc: Use the new in-place consumption APIs in the rx path
    - x86/kernel: Audit and remove any unnecessary uses of module.h
    - PCI: hv: Fix interrupt cleanup path
    - hv_netvsc: Fix VF register on bonding devices
    - hv_netvsc: don't lose VF information
    - hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait
    - hv_netvsc: reset vf_inject on VF removal
    - hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev
    - hv_netvsc: fix bonding devices check in netvsc_netdev_event()
    - Drivers: hv: vmbus: Use the new virt_xx barrier code
    - ixgbevf: call ndo_stop() instead of dev_close() when running offline
      selftest
    - ixgbevf: fix error code path when setting MAC address
    - ixgbevf: use bit operations for setting and checking resets
    - ixgbevf: Add support for generic Tx checksums
    - ixgbe/ixgbevf: Add support for bulk free in Tx cleanup & cleanup boolean
      logic
    - ixgbevf: refactor ethtool stats handling
    - ixgbevf: add support for per-queue ethtool stats
    - ixgbevf: make use of BIT() macro to avoid shift of signed values
    - ixgbevf: Move API negotiation function into mac_ops
    - ixgbevf: Add the device ID's presented while running on Hyper-V
    - ixgbevf: Support Windows hosts (Hyper-V)
    - ixgbevf: Change the relaxed order settings in VF driver for sparc
    - ixgbevf: Use mac_ops instead of trying to identify NIC type

  * New device ID for Kabypoint (LP: #1622469)
    - mfd: lpss: Add Intel Kaby Lake PCH-H PCI IDs
    - SAUCE: i2c: i801: Add support for Kaby Lake PCH-H

  * Xenial update to v4.4.21 stable release (LP: #1624037)
    - Revert "i40e: fix: do not sleep in netdev_ops"
    - fs: Check for invalid i_uid in may_follow_link()
    - netfilter: x_tables: check for size overflow
    - ext4: validate that metadata blocks do not overlap superblock
    - ext4: fix xattr shifting when expanding inodes
    - ext4: fix xattr shifting when expanding inodes part 2
    - ext4: properly align shifted xattrs when expanding inodes
    - ext4: avoid deadlock when expanding inode size
    - ext4: avoid modifying checksum fields directly during checksum 
verification
    - block: Fix race triggered by blk_set_queue_dying()
    - block: make sure a big bio is split into at most 256 bvecs
    - cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
    - s390/sclp_ctl: fix potential information leak with /dev/sclp
    - drm/radeon: fix radeon_move_blit on 32bit systems
    - drm: Reject page_flip for !DRIVER_MODESET
    - drm/msm: fix use of copy_from_user() while holding spinlock
    - ASoC: atmel_ssc_dai: Don't unconditionally reset SSC on stream startup
    - xfs: fix superblock inprogress check
    - timekeeping: Cap array access in timekeeping_debug
    - timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
    - lustre: remove unused declaration
    - wrappers for ->i_mutex access
    - ovl: don't copy up opaqueness
    - ovl: remove posix_acl_default from workdir
    - ovl: listxattr: use strnlen()
    - ovl: fix workdir creation
    - ubifs: Fix assertion in layout_in_gaps()
    - bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of
      two.
    - vhost/scsi: fix reuse of &vq->iov[out] in response
    - x86/apic: Do not init irq remapping if ioapic is disabled
    - uprobes: Fix the memcg accounting
    - crypto: caam - fix IV loading for authenc (giv)decryption
    - ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114
    - ALSA: firewire-tascam: accessing to user space outside spinlock
    - ALSA: fireworks: accessing to user space outside spinlock
    - ALSA: rawmidi: Fix possible deadlock with virmidi registration
    - ALSA: hda - Add headset mic quirk for Dell Inspiron 5468
    - ALSA: hda - Enable subwoofer on Dell Inspiron 7559
    - ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
    - ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
    - ALSA: timer: fix NULL pointer dereference on memory allocation failure
    - scsi: fix upper bounds check of sense key in scsi_sense_key_string()
    - metag: Fix atomic_*_return inline asm constraints
    - cpufreq: Fix GOV_LIMITS handling for the userspace governor
    - hwrng: exynos - Disable runtime PM on probe failure
    - regulator: anatop: allow regulator to be in bypass mode
    - lib/mpi: mpi_write_sgl(): fix skipping of leading zero limbs
    - Linux 4.4.21

  * Headset mic detection on some variants of Dell Inspiron 5468 (LP: #1617900)
    - ALSA: hda - Add headset mic quirk for Dell Inspiron 5468

  * Xenial update to v4.4.20 stable release (LP: #1621113)
    - hugetlb: fix nr_pmds accounting with shared page tables
    - x86/mm: Disable preemption during CR3 read+write
    - uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
    - tools/testing/nvdimm: fix SIGTERM vs hotplug crash
    - SUNRPC: Handle EADDRNOTAVAIL on connection failures
    - SUNRPC: allow for upcalls for same uid but different gss service
    - powerpc/eeh: eeh_pci_enable(): fix checking of post-request state
    - ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize 
HD
      (VF0610)
    - ALSA: usb-audio: Add quirk for ELP HD USB Camera
    - arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
    - parisc: Fix order of EREFUSED define in errno.h
    - virtio: fix memory leak in virtqueue_add()
    - vfio/pci: Fix NULL pointer oops in error interrupt setup handling
    - perf intel-pt: Fix occasional decoding errors when tracing system-wide
    - libnvdimm, nd_blk: mask off reserved status bits
    - ALSA: hda - Manage power well properly for resume
    - NVMe: Don't unmap controller registers on reset
    - PCI: Support PCIe devices with short cfg_size
    - PCI: Add Netronome vendor and device IDs
    - PCI: Limit config space size for Netronome NFP6000 family
    - PCI: Add Netronome NFP4000 PF device ID
    - PCI: Limit config space size for Netronome NFP4000
    - mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
    - ACPI: CPPC: Return error if _CPC is invalid on a CPU
    - ACPI / CPPC: Prevent cpc_desc_ptr points to the invalid data
    - um: Don't discard .text.exit section
    - genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
    - genirq/msi: Make sure PCI MSIs are activated early
    - crypto: caam - fix non-hmac hashes
    - crypto: caam - fix echainiv(authenc) encrypt shared descriptor
    - crypto: caam - defer aead_set_sh_desc in case of zero authsize
    - usb: ehci: change order of register cleanup during shutdown
    - usb: misc: usbtest: add fix for driver hang
    - usb: dwc3: pci: add Intel Kabylake PCI ID
    - usb: dwc3: gadget: increment request->actual once
    - usb: hub: Fix unbalanced reference count/memory leak/deadlocks
    - USB: hub: fix up early-exit pathway in hub_activate
    - USB: hub: change the locking in hub_activate
    - usb: renesas_usbhs: clear the BRDYSTS in usbhsg_ep_enable()
    - usb: renesas_usbhs: Use dmac only if the pipe type is bulk
    - USB: validate wMaxPacketValue entries in endpoint descriptors
    - usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
    - usb/gadget: fix gadgetfs aio support.
    - xhci: always handle "Command Ring Stopped" events
    - usb: xhci: Fix panic if disconnect
    - xhci: don't dereference a xhci member after removing xhci
    - USB: serial: fix memleak in driver-registration error path
    - USB: serial: option: add D-Link DWM-156/A3
    - USB: serial: option: add support for Telit LE920A4
    - USB: serial: ftdi_sio: add device ID for WICED USB UART dev board
    - USB: serial: ftdi_sio: add PIDs for Ivium Technologies devices
    - iommu/dma: Don't put uninitialised IOVA domains
    - iommu/arm-smmu: Fix CMDQ error handling
    - iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass
    - pinctrl/amd: Remove the default de-bounce time
    - EDAC: Increment correct counter in edac_inc_ue_error()
    - s390/dasd: fix hanging device after clear subchannel
    - mac80211: fix purging multicast PS buffer queue
    - arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
    - of: fix reference counting in of_graph_get_endpoint_by_regs
    - sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
    - sched/nohz: Fix affine unpinned timers mess
    - iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
    - drm/amdgpu: Change GART offset to 64-bit
    - drm/amdgpu: fix amdgpu_move_blit on 32bit systems
    - drm/amdgpu: avoid a possible array overflow
    - drm/amdgpu: skip TV/CV in display parsing
    - drm/amd/amdgpu: sdma resume fail during S4 on CI
    - drm/amdgpu: record error code when ring test failed
    - drm/i915: fix aliasing_ppgtt leak
    - ARC: build: Better way to detect ISA compatible toolchain
    - ARC: use correct offset in pt_regs for saving/restoring user mode r25
    - ARC: Call trace_hardirqs_on() before enabling irqs
    - ARC: Elide redundant setup of DMA callbacks
    - aacraid: Check size values after double-fetch from user
    - mfd: cros_ec: Add cros_ec_cmd_xfer_status() helper
    - i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
    - cdc-acm: fix wrong pipe type on rx interrupt xfers
    - mpt3sas: Fix resume on WarpDrive flash cards
    - megaraid_sas: Fix probing cards without io port
    - usb: renesas_usbhs: gadget: fix return value check in
      usbhs_mod_gadget_probe()
    - gpio: Fix OF build problem on UM
    - fs/seq_file: fix out-of-bounds read
    - btrfs: waiting on qgroup rescan should not always be interruptible
    - btrfs: properly track when rescan worker is running
    - Input: tegra-kbc - fix inverted reset logic
    - Input: i8042 - break load dependency between atkbd/psmouse and i8042
    - Input: i8042 - set up shared ps2_cmd_mutex for AUX ports
    - crypto: nx - off by one bug in nx_of_update_msc()
    - crypto: qat - fix aes-xts key sizes
    - dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
    - USB: avoid left shift by -1
    - usb: chipidea: udc: don't touch DP when controller is in host mode
    - USB: fix typo in wMaxPacketSize validation
    - USB: serial: mos7720: fix non-atomic allocation in write path
    - USB: serial: mos7840: fix non-atomic allocation in write path
    - USB: serial: option: add WeTelecom WM-D200
    - USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
    - staging: comedi: daqboard2000: bug fix board type matching code
    - staging: comedi: comedi_test: fix timer race conditions
    - staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
    - staging: comedi: ni_mio_common: fix wrong insn_write handler
    - ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
    - ACPI / drivers: replace acpi_probe_lock spinlock with mutex
    - ACPI / sysfs: fix error code in get_status()
    - ACPI / SRAT: fix SRAT parsing order with both LAPIC and X2APIC present
    - ALSA: line6: Remove double line6_pcm_release() after failed acquire.
    - ALSA: line6: Give up on the lock while URBs are released.
    - ALSA: line6: Fix POD sysfs attributes segfault
    - hwmon: (iio_hwmon) fix memory leak in name attribute
    - sysfs: correctly handle read offset on PREALLOC attrs
    - Linux 4.4.20

  * Failed to acknowledge elog: /sys/firmware/opal/elog/0x5018d709/acknowledge
    (2:No such file or directory) (LP: #1619552)
    - powerpc/powernv : Drop reference added by kset_find_obj()

  * backport support for userspace access of DP aux devices (LP: #1619756)
    - drm/dp: Add a drm_aux-dev module for reading/writing dpcd registers.
    - drm/dp: Allow signals to interrupt drm_aux-dev reads/writes
    - [Config] CONFIG_DRM_DP_AUX_CHARDEV=y

  * Enable virtual scsi server driver for Power (LP: #1615665)
    - SAUCE: Ibmvscsis: Properly deregister target sessions
    - SAUCE: Return TCMU-generated sense data to fabric module
    - SAUCE: Ibmvscsis: Code cleanup of print statements
    - SAUCE: Ibmvscsis: Fixed a bug reported by Dan Carpenter

  * ISST-LTE: system dropped into xmon at pcibios_release_device+0x5c/0x80
    during running dlpar test on monklp3 (LP: #1618151)
    - powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)

  * Kernel Build Fails for Fuse Module (LP: #1617550)
    - SAUCE: (namespace) userns: Export current_in_userns to modules

  * boot-time kernel panic introduced in 4.4.0-18, not present in 4.4.0-15
    (LP: #1572630)
    - blk-mq: Reuse hardware context cpumask for tags
    - blk-mq: Use proper cpumask iterator

 -- Seth Forshee <seth.fors...@canonical.com>  Fri, 07 Oct 2016 12:03:55
-0500

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2016-6828

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2016-7039

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1620317

Title:
  ISST-LTE:pNV: system ben is hung during ST (nvme)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  On when we are running I/O intensive tasks and CPU addition/removal,
  the block may hang stalling the entire machine.

  The backtrace below is one of the symptoms:

  [12747.111149] ---[ end trace b4d8d720952460b5 ]---
  [12747.126885] Trying to free IRQ 357 from IRQ context!
  [12747.146930] ------------[ cut here ]------------
  [12747.166674] WARNING: at 
/build/linux-iLHNl3/linux-4.4.0/kernel/irq/manage.c:1438
  [12747.184069] Modules linked in: minix nls_iso8859_1 rpcsec_gss_krb5 
auth_rpcgss nfsv4 nfs lockd grace fscache rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) 
iw_cm(OE) configfs ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) 
mlx4_ib(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) ib_addr(OE) mlx4_en(OE) 
mlx4_core(OE) binfmt_misc xfs joydev input_leds mac_hid ofpart cmdlinepart 
powernv_flash ipmi_powernv mtd ipmi_msghandler at24 opal_prd powernv_rng 
ibmpowernv uio_pdrv_genirq uio sunrpc knem(OE) autofs4 btrfs xor raid6_pq 
hid_generic usbhid hid uas usb_storage nouveau ast bnx2x i2c_algo_bit ttm 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mlx5_core(OE) ahci 
drm mdio libcrc32c mlx_compat(OE) libahci vxlan nvme ip6_udp_tunnel udp_tunnel
  [12747.349013] CPU: 80 PID: 0 Comm: swapper/80 Tainted: G        W  OEL  
4.4.0-21-generic #37-Ubuntu
  [12747.369046] task: c000000f1fab89b0 ti: c000000f1fb6c000 task.ti: 
c000000f1fb6c000
  [12747.404848] NIP: c000000000131888 LR: c000000000131884 CTR: 
00000000300303f0
  [12747.808333] REGS: c000000f1fb6e550 TRAP: 0700   Tainted: G        W  OEL   
(4.4.0-21-generic)
  [12747.867658] MSR: 9000000100029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28022222  
XER: 20000000
  [12747.884783] CFAR: c000000000aea8f4 SOFTE: 1 
  GPR00: c000000000131884 c000000f1fb6e7d0 c0000000015b4200 0000000000000028 
  GPR04: c000000f2a409c50 c000000f2a41b4e0 0000000f29480000 00000000000033da 
  GPR08: 0000000000000007 c000000000f8b27c 0000000f29480000 9000000100001003 
  GPR12: 0000000000002200 c000000007b6f800 c000000f2a40a938 0000000000000100 
  GPR16: c000000f11480000 0000000000003a98 0000000000000000 0000000000000000 
  GPR20: 0000000000000000 d000000009521008 d0000000095146a0 fffffffffffff000 
  GPR24: c000000004a19ef0 0000000000000000 0000000000000003 000000000000007d 
  GPR28: 0000000000000165 c000000eefeb1800 c000000eef830600 0000000000000165 
  [12748.243270] NIP [c000000000131888] __free_irq+0x238/0x370
  [12748.254089] LR [c000000000131884] __free_irq+0x234/0x370
  [12748.269738] Call Trace:
  [12748.286740] [c000000f1fb6e7d0] [c000000000131884] __free_irq+0x234/0x370 
(unreliable)
  [12748.289687] [c000000f1fb6e860] [c000000000131af8] free_irq+0x88/0xb0
  [12748.304594] [c000000f1fb6e890] [d000000009514528] 
nvme_suspend_queue+0xc8/0x150 [nvme]
  [12748.333825] [c000000f1fb6e8c0] [d00000000951681c] 
nvme_dev_disable+0x3fc/0x400 [nvme]
  [12748.340913] [c000000f1fb6e9a0] [d000000009516ae4] nvme_timeout+0xe4/0x260 
[nvme]
  [12748.357136] [c000000f1fb6ea60] [c000000000548a34] 
blk_mq_rq_timed_out+0x64/0x110
  [12748.383939] [c000000f1fb6ead0] [c00000000054c540] bt_for_each+0x160/0x170
  [12748.399292] [c000000f1fb6eb40] [c00000000054d4e8] 
blk_mq_queue_tag_busy_iter+0x78/0x110
  [12748.402665] [c000000f1fb6eb90] [c000000000547358] 
blk_mq_rq_timer+0x48/0x140
  [12748.438649] [c000000f1fb6ebd0] [c00000000014a13c] call_timer_fn+0x5c/0x1c0
  [12748.468126] [c000000f1fb6ec60] [c00000000014a5fc] 
run_timer_softirq+0x31c/0x3f0
  [12748.483367] [c000000f1fb6ed30] [c0000000000beb78] __do_softirq+0x188/0x3e0
  [12748.498378] [c000000f1fb6ee20] [c0000000000bf048] irq_exit+0xc8/0x100
  [12748.501048] [c000000f1fb6ee40] [c00000000001f954] timer_interrupt+0xa4/0xe0
  [12748.516377] [c000000f1fb6ee70] [c000000000002714] 
decrementer_common+0x114/0x180
  [12748.547282] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90
  [12748.547282]     LR = arch_local_irq_restore+0x74/0x90
  [12748.574141] [c000000f1fb6f160] [0000000000000001] 0x1 (unreliable)
  [12748.592405] [c000000f1fb6f180] [c000000000aedc3c] dump_stack+0xd0/0xf0
  [12748.596461] [c000000f1fb6f1c0] [c0000000001006fc] 
dequeue_task_idle+0x5c/0x90
  [12748.611532] [c000000f1fb6f230] [c0000000000f6080] 
deactivate_task+0xc0/0x130
  [12748.627685] [c000000f1fb6f270] [c000000000adcb10] __schedule+0x440/0x990
  [12748.654416] [c000000f1fb6f300] [c000000000add0a8] schedule+0x48/0xc0
  [12748.670558] [c000000f1fb6f330] [c000000000ae1474] 
schedule_timeout+0x274/0x350
  [12748.673485] [c000000f1fb6f420] [c000000000ade23c] 
wait_for_common+0xec/0x240
  [12748.699192] [c000000f1fb6f4a0] [c0000000000e6908] kthread_stop+0x88/0x210
  [12748.718385] [c000000f1fb6f4e0] [d000000009514240] 
nvme_dev_list_remove+0x90/0x110 [nvme]
  [12748.748925] [c000000f1fb6f510] [d000000009516498] 
nvme_dev_disable+0x78/0x400 [nvme]
  [12748.752112] [c000000f1fb6f5f0] [d000000009516ae4] nvme_timeout+0xe4/0x260 
[nvme]
  [12748.775395] [c000000f1fb6f6b0] [c000000000548a34] 
blk_mq_rq_timed_out+0x64/0x110
  [12748.821069] [c000000f1fb6f720] [c00000000054c540] bt_for_each+0x160/0x170
  [12748.851733] [c000000f1fb6f790] [c00000000054d4e8] 
blk_mq_queue_tag_busy_iter+0x78/0x110
  [12748.883093] [c000000f1fb6f7e0] [c000000000547358] 
blk_mq_rq_timer+0x48/0x140
  [12748.918348] [c000000f1fb6f820] [c00000000014a13c] call_timer_fn+0x5c/0x1c0
  [12748.934743] [c000000f1fb6f8b0] [c00000000014a5fc] 
run_timer_softirq+0x31c/0x3f0
  [12748.938084] [c000000f1fb6f980] [c0000000000beb78] __do_softirq+0x188/0x3e0
  [12748.960815] [c000000f1fb6fa70] [c0000000000bf048] irq_exit+0xc8/0x100
  [12748.992175] [c000000f1fb6fa90] [c00000000001f954] timer_interrupt+0xa4/0xe0
  [12749.019299] [c000000f1fb6fac0] [c000000000002714] 
decrementer_common+0x114/0x180
  [12749.037168] --- interrupt: 901 at arch_local_irq_restore+0x74/0x90
  [12749.037168]     LR = arch_local_irq_restore+0x74/0x90
  [12749.079044] [c000000f1fb6fdb0] [c000000f2a41d680] 0xc000000f2a41d680 
(unreliable)
  [12749.081736] [c000000f1fb6fdd0] [c000000000909a28] 
cpuidle_enter_state+0x1a8/0x410
  [12749.127094] [c000000f1fb6fe30] [c000000000119a88] call_cpuidle+0x78/0xd0
  [12749.144435] [c000000f1fb6fe70] [c000000000119e5c] 
cpu_startup_entry+0x37c/0x480
  [12749.166156] [c000000f1fb6ff30] [c00000000004563c] 
start_secondary+0x33c/0x360
  [12749.186929] [c000000f1fb6ff90] [c000000000008b6c] 
start_secondary_prolog+0x10/0x14
  [12749.223828] Instruction dump:
  [12749.223856] 4e800020 4bf83a5d 60000000 4bffff64 4bf83a51 60000000 4bffffa8 
3c62ff7b 
  [12749.233245] 7f84e378 38630fe0 489b900d 60000000 <0fe00000> 4bfffe20 
7d2903a6 387d0118 
  [12749.298371] ---[ end trace b4d8d720952460b6 ]---

  == Comment: #184 - Gabriel Krisman Bertazi <gbert...@br.ibm.com> - 2016-07-29 
12:55:48 ==
  I got it figured out.  The nvme driver is not playing nice with the block 
timeout infrastructure, in a way that the timeout code goes into a live lock, 
waiting for the queue to be released.  CPU hotplug, on the other hand, who is 
holding the queue freeze lock at the time, is waiting for an outstanding 
request to timeout (or complete).  This request, in turn is stuck in the 
device, requiring a reset triggered by a timeout, which never happens due to 
the live lock.

  I don't have the reason why the request is stuck inside the device
  requiring a timeout, but this could even be caused by the Leaf
  firmware itself. I also see some successful timeouts  triggered under
  normal conditions.  In the failure event, we should be able to abort
  the request normally, but this happens via the timeout infrastructure,
  which is blocked during cpu hotplug events.

  I have a quirk to fully recover after the failure, by forcing a reset
  of the stucked IO, which allows the cpu hotplug completion and block
  layer recovery.  I have a machine hitting the failure every few
  minutes in a loop, and recovering from it with my patch.

  Patch submitted to linux-block

  https://marc.info/?l=linux-block&m=146976739016592&w=2

  == Comment: #207 - Gabriel Krisman Bertazi <gbert...@br.ibm.com> - 2016-09-05 
09:13:51 ==
  Canonical,

  This is fixed by:

  e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx")
  0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
  71f79fb3179e (" blk-mq: Allow timeouts to run while queue is freezing")

  Which will apply cleanly on top of your kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1620317/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to