Tripped this again today w/ 5.4.0-86-generic:

[179417.505068] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
[kworker/2:1:691464]
[179417.505110] Modules linked in: xt_multiport cpuid veth xt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat 
br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua edac_mce_amd ftdi_sio eeepc_wmi kvm_amd 
usbserial asus_wmi sparse_keymap snd_hda_codec_realtek snd_hda_codec_generic 
ledtrig_audio kvm snd_hda_codec_hdmi video snd_hda_intel snd_intel_dspcfg 
wmi_bmof snd_hda_codec snd_hda_core snd_hwdep snd_pcm k10temp snd_timer snd 
soundcore ccp mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt 
nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype 
sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
ip6table_filter ip6_tables iptable_filter bpfilter msr ip_tables x_tables 
autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic uas usbhid hid
[179417.505142]  usb_storage amdgpu crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel amd_iommu_v2 gpu_sched aesni_intel ttm crypto_simd 
drm_kms_helper cryptd syscopyarea glue_helper sysfillrect sysimgblt fb_sys_fops 
igb mxm_wmi drm dca i2c_algo_bit ahci libahci i2c_piix4 wmi gpio_amdpt 
gpio_generic
[179417.505153] CPU: 2 PID: 691464 Comm: kworker/2:1 Not tainted 
5.4.0-86-generic #97-Ubuntu
[179417.505154] Hardware name: System manufacturer System Product Name/ROG 
STRIX X399-E GAMING, BIOS 1203 10/09/2019
[179417.505160] Workqueue: events free_work
[179417.505164] RIP: 0010:smp_call_function_many+0x208/0x270
[179417.505166] Code: 92 00 3b 05 8e c9 70 01 89 c7 0f 83 9b fe ff ff 48 63 c7 
48 8b 0b 48 03 0c c5 80 99 84 af 8b 41 18 a8 01 74 0a f3 90 8b 51 18 <83> e2 01 
75 f6 eb c8 89 cf 48 c7 c2 e0 b8 c4 af 4c 89 fe e8 00 1d
[179417.505166] RSP: 0018:ffffa88c84877d00 EFLAGS: 00000202 ORIG_RAX: 
ffffffffffffff13
[179417.505167] RAX: 0000000000000003 RBX: ffff97e83d0abd40 RCX: 
ffff97e83d072080
[179417.505168] RDX: 0000000000000003 RSI: 0000000000000000 RDI: 
0000000000000001
[179417.505168] RBP: ffffa88c84877d40 R08: ffff97e836da7c40 R09: 
0000000000000003
[179417.505169] R10: ffff97e836da7c40 R11: 0000000000000002 R12: 
ffffffffae481930
[179417.505169] R13: 0000000000000000 R14: 0000000000000001 R15: 
0000000000000080
[179417.505170] FS:  0000000000000000(0000) GS:ffff97e83d080000(0000) 
knlGS:0000000000000000
[179417.505170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[179417.505171] CR2: 00007fde360abfb0 CR3: 00000003f880a000 CR4: 
00000000003406e0
[179417.505171] Call Trace:
[179417.505176]  ? load_new_mm_cr3+0xf0/0xf0
[179417.505177]  on_each_cpu+0x2d/0x60
[179417.505178]  flush_tlb_kernel_range+0x38/0x90
[179417.505179]  __purge_vmap_area_lazy+0x70/0x6d0
[179417.505180]  free_vmap_area_noflush+0xe1/0xf0
[179417.505180]  remove_vm_area+0x9a/0xb0
[179417.505181]  __vunmap+0x5f/0x210
[179417.505182]  free_work+0x25/0x30
[179417.505184]  process_one_work+0x1eb/0x3b0
[179417.505185]  worker_thread+0x4d/0x400
[179417.505186]  kthread+0x104/0x140
[179417.505187]  ? process_one_work+0x3b0/0x3b0
[179417.505188]  ? kthread_park+0x90/0x90
[179417.505190]  ret_from_fork+0x22/0x40
[179426.629482] Sending NMI from CPU 11 to CPUs 17:
[179445.505306] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! 
[kworker/2:1:691464]

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1938722

Title:
  watchdog: BUG: soft lockup  on Threadripper 2950X

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Description:    Ubuntu 20.04.2 LTS
  Release:        20.04

  Been suddenly seeing a number of crashes today on my threadripper
  2950x box today after the system being off over the weekend.

  Suspect it may be tied to Ubuntu 5.4.0-80.90-generic 5.4.124 kernel,
  as I wasn't seeing it last week or previously.

  
  Aug  2 16:52:14 threadripper kernel: [  600.168436] watchdog: BUG: soft 
lockup - CPU#19 stuck for 22s! [kworker/19:0:11301]
  Aug  2 16:52:14 threadripper kernel: [  600.168490] Modules linked in: veth 
xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf
  _nat br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_hda_codec_realtek 
snd_hda_codec_generic 
  ledtrig_audio snd_hda_codec_hdmi eeepc_wmi snd_hda_intel edac_mce_amd 
snd_intel_dspcfg asus_wmi ftdi_sio snd_hda_codec kvm_amd usbserial 
sparse_keymap snd_
  hda_core kvm video wmi_bmof snd_hwdep snd_pcm snd_timer snd ccp soundcore 
k10temp mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt 
nf_log_ipv4 
  nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG xt_limit xt_addrtype 
sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
ip6table
  _filter ip6_tables iptable_filter bpfilter ip_tables x_tables autofs4 btrfs 
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor 
  async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic 
usbhid hid uas usb_storage amdgpu
  Aug  2 16:52:14 threadripper kernel: [  600.168542]  amd_iommu_v2 gpu_sched 
crct10dif_pclmul ttm crc32_pclmul ghash_clmulni_intel drm_kms_helper syscopyare
  a aesni_intel crypto_simd mxm_wmi sysfillrect cryptd sysimgblt glue_helper 
fb_sys_fops igb drm dca i2c_piix4 ahci i2c_algo_bit libahci gpio_amdpt wmi gpio_
  generic
  Aug  2 16:52:14 threadripper kernel: [  600.168558] CPU: 19 PID: 11301 Comm: 
kworker/19:0 Tainted: G             L    5.4.0-80-generic #90-Ubuntu
  Aug  2 16:52:14 threadripper kernel: [  600.168559] Hardware name: System 
manufacturer System Product Name/ROG STRIX X399-E GAMING, BIOS 1203 10/09/2019
  Aug  2 16:52:14 threadripper kernel: [  600.168569] Workqueue: events 
free_work
  Aug  2 16:52:14 threadripper kernel: [  600.168574] RIP: 
0010:smp_call_function_many+0x205/0x270
  Aug  2 16:52:14 threadripper kernel: [  600.168576] Code: e8 50 10 92 00 3b 
05 ae cf 70 01 89 c7 0f 83 9b fe ff ff 48 63 c7 48 8b 0b 48 03 0c c5 80 99 64 a
  1 8b 41 18 a8 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 89 cf 48 c7 c2 
a0 b8 a4 a1 4c 89 fe
  Aug  2 16:52:14 threadripper kernel: [  600.168577] RSP: 
0018:ffffb66b0aa17d00 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
  Aug  2 16:52:14 threadripper kernel: [  600.168579] RAX: 0000000000000003 
RBX: ffff8de1fd4ebd40 RCX: ffff8de1fd0b2540
  Aug  2 16:52:14 threadripper kernel: [  600.168580] RDX: 0000000000000001 
RSI: 0000000000000000 RDI: 0000000000000002
  Aug  2 16:52:14 threadripper kernel: [  600.168580] RBP: ffffb66b0aa17d40 
R08: ffff8de1f6da7190 R09: 0000000000000003
  Aug  2 16:52:14 threadripper kernel: [  600.168581] R10: ffff8de1f6da7190 
R11: 0000000000000002 R12: ffffffffa0281930
  Aug  2 16:52:14 threadripper kernel: [  600.168581] R13: 0000000000000000 
R14: 0000000000000001 R15: 0000000000000080
  Aug  2 16:52:14 threadripper kernel: [  600.168583] FS:  
0000000000000000(0000) GS:ffff8de1fd4c0000(0000) knlGS:0000000000000000
  Aug  2 16:52:14 threadripper kernel: [  600.168583] CS:  0010 DS: 0000 ES: 
0000 CR0: 0000000080050033
  Aug  2 16:52:14 threadripper kernel: [  600.168584] CR2: 000055ea29edefd0 
CR3: 00000009c500a000 CR4: 00000000003406e0
  Aug  2 16:52:14 threadripper kernel: [  600.168585] Call Trace:
  Aug  2 16:52:14 threadripper kernel: [  600.168592]  ? 
load_new_mm_cr3+0xf0/0xf0
  Aug  2 16:52:14 threadripper kernel: [  600.168594]  on_each_cpu+0x2d/0x60
  Aug  2 16:52:14 threadripper kernel: [  600.168596]  
flush_tlb_kernel_range+0x38/0x90
  Aug  2 16:52:14 threadripper kernel: [  600.168597]  
__purge_vmap_area_lazy+0x70/0x6d0
  Aug  2 16:52:14 threadripper kernel: [  600.168598]  
free_vmap_area_noflush+0xe1/0xf0
  Aug  2 16:52:14 threadripper kernel: [  600.168600]  remove_vm_area+0x9a/0xb0
  Aug  2 16:52:14 threadripper kernel: [  600.168602]  __vunmap+0x5f/0x210
  Aug  2 16:52:14 threadripper kernel: [  600.168603]  free_work+0x25/0x30
  Aug  2 16:52:14 threadripper kernel: [  600.168607]  
process_one_work+0x1eb/0x3b0
  Aug  2 16:52:14 threadripper kernel: [  600.168609]  worker_thread+0x4d/0x400
  Aug  2 16:52:14 threadripper kernel: [  600.168611]  kthread+0x104/0x140
  Aug  2 16:52:14 threadripper kernel: [  600.168612]  ? 
process_one_work+0x3b0/0x3b0
  Aug  2 16:52:14 threadripper kernel: [  600.168613]  ? kthread_park+0x90/0x90
  Aug  2 16:52:14 threadripper kernel: [  600.168617]  ret_from_fork+0x22/0x40
  Aug  2 16:52:40 threadripper kernel: [  606.280524] rcu: INFO: rcu_sched 
detected stalls on CPUs/tasks:
  Aug  2 16:52:40 threadripper kernel: [  606.280567] rcu:        2-...0: (1 
GPs behind) idle=ae6/1/0x4000000000000000 softirq=26910/26911 fqs=7179 
  Aug  2 16:52:40 threadripper kernel: [  606.280609] rcu:        18-...0: (1 
GPs behind) idle=c8e/1/0x4000000000000000 softirq=28056/28057 fqs=7179 
  Aug  2 16:52:40 threadripper kernel: [  606.280659]     (detected by 24, 
t=15002 jiffies, g=39017, q=5149545)
  Aug  2 16:52:40 threadripper kernel: [  606.280661] Sending NMI from CPU 24 
to CPUs 2:
  Aug  2 16:52:40 threadripper kernel: [  616.204803] Sending NMI from CPU 24 
to CPUs 18:
  Aug  2 16:52:40 threadripper kernel: [  626.131497] rcu: rcu_sched kthread 
starved for 4960 jiffies! g39017 f0x2 RCU_GP_DOING_FQS(6) ->state=0x0 ->cpu=7
  Aug  2 16:52:40 threadripper kernel: [  626.131554] rcu: RCU grace-period 
kthread stack dump:
  Aug  2 16:52:40 threadripper kernel: [  626.131577] rcu_sched       R  
running task        0    11      2 0x80004000
  Aug  2 16:52:40 threadripper kernel: [  626.131580] Call Trace:
  Aug  2 16:52:40 threadripper kernel: [  626.131589]  __schedule+0x2e3/0x740
  Aug  2 16:52:40 threadripper kernel: [  626.131592]  
preempt_schedule_common+0x18/0x30
  Aug  2 16:52:40 threadripper kernel: [  626.131594]  _cond_resched+0x22/0x30
  Aug  2 16:52:40 threadripper kernel: [  626.131597]  force_qs_rnp+0xa8/0x170
  Aug  2 16:52:40 threadripper kernel: [  626.131598]  ? 
synchronize_sched_expedited_wait+0x180/0x180
  Aug  2 16:52:40 threadripper kernel: [  626.131600]  
rcu_gp_kthread+0x5e8/0x990
  Aug  2 16:52:40 threadripper kernel: [  626.131604]  kthread+0x104/0x140
  Aug  2 16:52:40 threadripper kernel: [  626.131605]  ? 
kfree_call_rcu+0x20/0x20
  Aug  2 16:52:40 threadripper kernel: [  626.131607]  ? kthread_park+0x90/0x90
  Aug  2 16:52:40 threadripper kernel: [  626.131608]  ret_from_fork+0x22/0x40

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-5.4.0-80-generic 5.4.0-80.90
  ProcVersionSignature: Ubuntu 5.4.0-80.90-generic 5.4.124
  Uname: Linux 5.4.0-80-generic x86_64
  AlsaVersion: Advanced Linux Sound Architecture Driver Version 
k5.4.0-80-generic.
  ApportVersion: 2.20.11-0ubuntu27.18
  Architecture: amd64
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', 
'/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D2c', 
'/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', 
'/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D7p', 
'/dev/snd/pcmC1D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Card0.Amixer.info:
   Card hw:0 'Generic'/'HD-Audio Generic at 0xba600000 irq 96'
     Mixer name : 'Realtek ALC1220'
     Components : 'HDA:10ec1168,10438723,00100003'
     Controls      : 46
     Simple ctrls  : 20
  Card1.Amixer.info:
   Card hw:1 'HDMI'/'HDA ATI HDMI at 0x9f860000 irq 98'
     Mixer name : 'ATI R6xx HDMI'
     Components : 'HDA:1002aa01,00aa0100,00100700'
     Controls      : 14
     Simple ctrls  : 2
  CasperMD5CheckResult: skip
  Date: Mon Aug  2 19:09:24 2021
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  MachineType: System manufacturer System Product Name
  ProcEnviron:
   TERM=screen.xterm-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic 
root=UUID=04417339-7685-11e9-bdb0-049226da3a81 ro pci=nommconf consoleblank=60
  RelatedPackageVersions:
   linux-restricted-modules-5.4.0-80-generic N/A
   linux-backports-modules-5.4.0-80-generic  N/A
   linux-firmware                            1.187.15
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: Upgraded to focal on 2021-01-23 (191 days ago)
  dmi.bios.date: 10/09/2019
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 1203
  dmi.board.asset.tag: Default string
  dmi.board.name: ROG STRIX X399-E GAMING
  dmi.board.vendor: ASUSTeK COMPUTER INC.
  dmi.board.version: Rev 1.xx
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: Default string
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr1203:bd10/09/2019:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX399-EGAMING:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:
  dmi.product.family: To be filled by O.E.M.
  dmi.product.name: System Product Name
  dmi.product.sku: SKU
  dmi.product.version: System Version
  dmi.sys.vendor: System manufacturer

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1938722/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to