Patch landed in between in disco's release pocket, hence adjusting to
Fix Released.

** Changed in: linux (Ubuntu)
       Status: Fix Committed => Fix Released

** Changed in: ubuntu-power-systems
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1842465

Title:
  Watchdog error about hard lockup

Status in The Ubuntu-power-systems project:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released

Bug description:
  ---Problem Description---
  Got a message from Watchdog about self-detected hard LOCKUP
   
  ---uname output---
  Linux power 5.0.0-23-generic #24~18.04.1-Ubuntu SMP Mon Jul 29 16:08:34 UTC 
2019 ppc64le ppc64le ppc64le GNU/Linux
   
  ---Additional Hardware Info---
  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  4
  Core(s) per socket:  16
  Socket(s):           2
  NUMA node(s):        6
  Model:               2.2 (pvr 004e 1202)
  Model name:          POWER9, altivec supported
  CPU max MHz:         3800.0000
  CPU min MHz:         2300.0000
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node8 CPU(s):   64-127
  NUMA node252 CPU(s):
  NUMA node253 CPU(s):
  NUMA node254 CPU(s):
  NUMA node255 CPU(s):
  ---
  free
                total        used        free      shared  buff/cache   
available
  Mem:     1071807104     5110016   985192768     6229440    81504320  
1056273664
  Swap:       2097088           0     2097088
  --
  lsblk
  NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
  sda       8:0    1 894.3G  0 disk
  ??sda1    8:1    1     7M  0 part
  ??sda2    8:2    1 894.3G  0 part /
  sdb       8:16   1 894.3G  0 disk
  nvme0n1 259:1    0   2.9T  0 disk /nvmdisk1
  ---
   
  Machine Type = AC922, bare metal 
   
  ---Steps to Reproduce---
   This problem I encountered when running customer workload and I switched SMT 
levels from SMT2 to SMT1 and I got a 
  lockup error right away!! this seems to be a different one... postgresql DB 
daemon was running on the system.
   
  Stack trace output:
   [756383.688067] watchdog: CPU 53 self-detected hard LOCKUP @ 
_raw_spin_lock+0x54/0xe0
  [756383.688068] watchdog: CPU 53 TB:387344180861438, last heartbeat 
TB:387337108856720 (13812ms ago)
  [756383.688069] Modules linked in: binfmt_misc veth ipt_MASQUERADE 
nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 
xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc aufs overlay 
vmx_crypto ofpart cmdlinepart powernv_flash ipmi_powernv opal_prd mtd 
ipmi_devintf at24 ibmpowernv ipmi_msghandler uio_pdrv_genirq uio sch_fq_codel 
ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ast 
crct10dif_vpmsum i2c_algo_bit crc32c_vpmsum ttm mlx5_core drm_kms_helper 
syscopyarea nvme sysfillrect sysimgblt fb_sys_fops drm nvme_core ahci libahci 
tls mlxfw devlink tg3 drm_panel_orientation_quirks
  [756383.688088] CPU: 53 PID: 119744 Comm: postgres Not tainted 
5.0.0-23-generic #24~18.04.1-Ubuntu
  [756383.688088] NIP:  c000000000e0fcc4 LR: c00000000015fd90 CTR: 
c000000000600460
  [756383.688089] REGS: c000007fffb3bd70 TRAP: 0900   Not tainted  
(5.0.0-23-generic)
  [756383.688089] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 
28242824  XER: 00000000
  [756383.688091] CFAR: c000000000e0fcec IRQMASK: 1 
  [756383.688092] GPR00: c00000000015fd90 c000206f2cdf7970 c00000000185c700 
c00020732ea49100 
  [756383.688093] GPR04: c000206f2cdf7a38 0000000000000000 c000206f2cdf7b00 
0000000000000001 
  [756383.688095] GPR08: 0000000000000003 000000008000007d 0000000080000035 
fffffffffffffffd 
  [756383.688096] GPR12: 0000000000002000 c000007ffffc5080 00007cde07504dd8 
00000f495eee0d68 
  [756383.688097] GPR16: 00007fffc0eb2bd7 00007fffc0eb2aa0 00000f496c289088 
00007fffc0eb2a74 
  [756383.688098] GPR20: 0000000000000000 0000000000000001 0000000000000001 
0000000000000000 
  [756383.688099] GPR24: 0000000000000000 c000206f2cdf7a38 c000000001349100 
000020732d700000 
  [756383.688100] GPR28: c000000001891c70 c000206f36d8b400 c000000001895c78 
c00020732ea49100 
  [756383.688102] NIP [c000000000e0fcc4] _raw_spin_lock+0x54/0xe0
  [756383.688102] LR [c00000000015fd90] __task_rq_lock+0x80/0x150
  [756383.688102] Call Trace:
  [756383.688103] [c000206f2cdf7970] [c000206f2cdf79d0] 0xc000206f2cdf79d0 
(unreliable)
  [756383.688103] [c000206f2cdf79a0] [c000007fd3847818] 0xc000007fd3847818
  [756383.688104] [c000206f2cdf7a10] [c0000000001649c0] 
try_to_wake_up+0x380/0x710
  [756383.688105] [c000206f2cdf7aa0] [c000000000164de0] wake_up_q+0x70/0xd0
  [756383.688105] [c000206f2cdf7ae0] [c0000000005fab54] 
do_semtimedop+0x474/0xcc0
  [756383.688106] [c000206f2cdf7d60] [c0000000005fc634] 
ksys_semtimedop+0xd4/0xf0
  [756383.688107] [c000206f2cdf7dc0] [c00000000060047c] sys_ipc+0x14c/0x470
  [756383.688107] [c000206f2cdf7e20] [c00000000000b288] system_call+0x5c/0x70
  [756383.688108] Instruction dump:
  [756383.688108] 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 4d9e0020 
fbc1fff0 3fc20004 
  [756383.688110] 3bde9578 fbe1fff8 7c7f1b78 f821ffd1 <7c210b78> e93e0000 
75290010 41820014 
  [756386.336267] watchdog: CPU 53 became unstuck TB:387345536789288
  [756386.336292] CPU: 53 PID: 330 Comm: migration/53 Not tainted 
5.0.0-23-generic #24~18.04.1-Ubuntu
  [756386.336294] Call Trace:
  [756386.336301] [c000007fed49fb40] [c000000000dea90c] dump_stack+0xb0/0xf4 
(unreliable)
  [756386.336307] [c000007fed49fb80] [c0000000000342dc] 
wd_smp_clear_cpu_pending+0x41c/0x430
  [756386.336311] [c000007fed49fc30] [c00000000022909c] 
multi_cpu_stop+0x14c/0x210
  [756386.336313] [c000007fed49fc90] [c0000000002294bc] 
cpu_stopper_thread+0xfc/0x1e0
  [756386.336317] [c000007fed49fd40] [c000000000157d00] 
smpboot_thread_fn+0x270/0x2c0
  [756386.336321] [c000007fed49fdb0] [c000000000151608] kthread+0x1a8/0x1b0
  [756386.336324] [c000007fed49fe20] [c00000000000b65c] 
ret_from_kernel_thread+0x5c/0x80
  [771875.432658] irq_migrate_all_off_this_cpu: 91 callbacks suppressed
  [771875.432660] IRQ 110: no longer affine to CPU1
  [771875.432694] IRQ 194: no longer affine to CPU1
  [771875.498115] IRQ 192: no longer affine to CPU5
  [771875.498124] IRQ 193: no longer affine to CPU5
  [771875.498133] IRQ 201: no longer affine to CPU5
  [771875.551051] IRQ 153: no longer affine to CPU9
  [771875.551073] IRQ 229: no longer affine to CPU9
  [771875.551149] IRQ 543: no longer affine to CPU9
  [771875.602160] IRQ 199: no longer affine to CPU13
  [771875.602170] IRQ 226: no longer affine to CPU13


  == srikar.dronamr...@in.ibm.com ==
  Also these false positives will probably be fixed by the commit 

  
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053

  which reads 
  From 7ae3f6e130e8dc6188b59e3b4ebc2f16e9c8d053 Mon Sep 17 00:00:00 2001
  From: Nicholas Piggin <npig...@gmail.com>
  Date: Tue, 9 Apr 2019 14:40:05 +1000
  Subject: [PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

  Using a jiffies timer creates a dependency on the tick_do_timer_cpu
  incrementing jiffies. If that CPU has locked up and jiffies is not
  incrementing, the watchdog heartbeat timer for all CPUs stops and
  creates false positives and confusing warnings on local CPUs, and
  also causes the SMP detector to stop, so the root cause is never
  detected.

  Fix this by using hrtimer based timers for the watchdog heartbeat,
  like the generic kernel hardlockup detector.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1842465/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to