Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc2

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1672521

Title:
  ThunderX: soft lockup on 4.8+ kernels

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Yakkety:
  Triaged
Status in linux source package in Zesty:
  Triaged

Bug description:
  I have been trying to easily reproduce this for days.
  We initially observed it in OPNFV Armband, when we tried to upgrade our 
Ubuntu Xenial installation kernel to linux-image-generic-hwe-16.04 (4.8).

  In our environment, this was easily triggered on compute nodes, when 
launching multiple VMs (we suspected OVS, QEMU etc.).
  However, in order to rule out our specifics, we looked for a simple way to 
reproduce it on all ThunderX nodes we have access to, and we finally found it:

  $ apt-get install stress-ng
  $ stress-ng --hdd 1024

  We tested different FW versions, provided by both chip/board manufacturers, 
and with all of them the result is 100% reproductible, leading to a kernel Oops 
[1]:
  [  726.070531] INFO: task kworker/0:1:312 blocked for more than 120 seconds.
  [  726.077908]       Tainted: G        W I     4.8.0-41-generic 
#44~16.04.1-Ubuntu
  [  726.085850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.094383] kworker/0:1     D ffff0000080861bc     0   312      2 
0x00000000
  [  726.094401] Workqueue: events vmstat_shepherd
  [  726.094404] Call trace:
  [  726.094411] [<ffff0000080861bc>] __switch_to+0x94/0xa8
  [  726.094418] [<ffff0000089854f4>] __schedule+0x224/0x718
  [  726.094421] [<ffff000008985a20>] schedule+0x38/0x98
  [  726.094425] [<ffff000008985d84>] schedule_preempt_disabled+0x14/0x20
  [  726.094428] [<ffff000008987644>] __mutex_lock_slowpath+0xd4/0x168
  [  726.094431] [<ffff000008987730>] mutex_lock+0x58/0x70
  [  726.094437] [<ffff0000080c552c>] get_online_cpus+0x44/0x70
  [  726.094440] [<ffff00000820ca24>] vmstat_shepherd+0x3c/0xe8
  [  726.094446] [<ffff0000080e1c60>] process_one_work+0x150/0x478
  [  726.094449] [<ffff0000080e1fd8>] worker_thread+0x50/0x4b8
  [  726.094453] [<ffff0000080e8eac>] kthread+0xec/0x100
  [  726.094456] [<ffff000008083690>] ret_from_fork+0x10/0x40

  
  Over the last few days, I tested all 4.8-* and 4.10 (zesty backport), the 
soft lockup happens with each and every one of them.
  On the other hand, 4.4.0-45-generic seems to work perfectly fine (probably 
newer 4.4.0-* too, but due to a regression in the ethernet drivers after 
4.4.0-45, we can't test those with ease) under normal conditions, yet running 
stress-ng leads to the same oops.

  [1] http://paste.ubuntu.com/24172516/
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Mar 13 19:27 seq
   crw-rw---- 1 root audio 116, 33 Mar 13 19:27 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 16.04
  IwConfig: Error: [Errno 2] No such file or directory
  MachineType: GIGABYTE R120-T30
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=vt220
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 astdrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-41-generic 
root=/dev/mapper/os-root ro console=tty0 console=ttyS0,115200 
console=ttyAMA0,115200 net.ifnames=1 biosdevname=0 rootdelay=90 nomodeset quiet 
splash vt.handoff=7
  ProcVersionSignature: Ubuntu 4.8.0-41.44~16.04.1-generic 4.8.17
  RelatedPackageVersions:
   linux-restricted-modules-4.8.0-41-generic N/A
   linux-backports-modules-4.8.0-41-generic  N/A
   linux-firmware                            1.157.8
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial
  Uname: Linux 4.8.0-41-generic aarch64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  _MarkForUpload: True
  dmi.bios.date: 11/22/2016
  dmi.bios.vendor: GIGABYTE
  dmi.bios.version: T22
  dmi.board.asset.tag: 01234567890123456789AB
  dmi.board.name: MT30-GS0
  dmi.board.vendor: GIGABYTE
  dmi.board.version: 01234567
  dmi.chassis.asset.tag: 01234567890123456789AB
  dmi.chassis.type: 17
  dmi.chassis.vendor: GIGABYTE
  dmi.chassis.version: 01234567
  dmi.modalias: 
dmi:bvnGIGABYTE:bvrT22:bd11/22/2016:svnGIGABYTE:pnR120-T30:pvr0100:rvnGIGABYTE:rnMT30-GS0:rvr01234567:cvnGIGABYTE:ct17:cvr01234567:
  dmi.product.name: R120-T30
  dmi.product.version: 0100
  dmi.sys.vendor: GIGABYTE

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672521/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to