** Changed in: ubuntu-power-systems
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1768898
Title:
smp_call_function_single/many core hangs with stop4 alone
Status in The Ubuntu-power-systems project:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Bionic:
Fix Released
Bug description:
== SRU Justification ==
IBM reports that this bug occurs with stop4 which results in soft lockups/rcu
stalls.
This is a kernel synchronization issue leading to a dead lock.
This bug was introduced by commit 7bc54b652f13 in v4.8-rc1. This
regression is fixed by mainline commit c0f7f5b6c6910.
== Fix ==
c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call
in timer interrupt")
== Regression Potential ==
Low. Fixes current regression. Cc'd to upstream stable, so it has had
additon upstream review.
== Test Case ==
A test kernel was built with this patch and tested by the original bug
reporter.
The bug reporter states the test kernel resolved the bug.
Recently we discovered this bug occurs just alone with stop4 which
results in soft lockups/rcu stalls.
```
root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service:
Processes still around after final SIGKILL. Entering failed mode.
[15523.619508] systemd[1]: systemd-journald.service: Failed with result
'timeout'.
[15523.619769] systemd[1]: Failed to start Journal Service.
[15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off
time, scheduling restart.
[15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job,
restart counter is at 21.
[15523.621462] systemd[1]: Stopped Journal Service.
[15523.621635] systemd[1]: systemd-journald.service: Found left-over process
1561 (systemd-journal) in control group while starting unit. Ignoring.
[15523.621756] systemd[1]: This usually indicates unclean termination of a
previous run, or service implementation deficiencies.
[15523.621888] systemd[1]: systemd-journald.service: Found left-over process
69060 (systemd-journal) in control group while starting unit. Ignoring.
[15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched
self-detected stall on CPU
[15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=999069
[15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
[15541.630101] NMI backtrace for cpu 60
[15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15541.630207] Call Trace:
[15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15541.630298] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15541.630363] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15541.630429] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15541.630495] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15541.630560] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15541.630625] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15541.630689] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15541.630753] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15541.630818] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15541.630883] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15541.630948] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15541.631069] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
[15541.631135] LR = smp_call_function_single+0x110/0x180
[15541.631230] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15541.631294] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15541.631488] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15541.631718] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15541.631784] LR = smp_call_function_many+0x324/0x450
[15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15541.631935] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15541.632000] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15541.632065] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
[15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart
cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd
ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm
sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast
[15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas
syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class
crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas
[15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR:
c000000000acc7f0
[15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L
(4.15.0-15-generic)
[15579.002735] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48044222
XER: 00000000
[15579.002810] CFAR: c01721ed8
[15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960
0000000000000000
[15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00
[15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180
[15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180
[15579.003191] Call Trace:
[15579.003217] [c000003e84effa60] [c0000000001d5340]
smp_call_function_single+0x110/0x180 (unreliable)
[15579.003298] [c000003e84effad0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15579.003381] [c000003e84effb30] [c000000000acc840]
powernv_cpufreq_get+0x50/0x70
[15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140
[15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0
[15579.003560] [c000003e84effbe0] [c00000000009da50]
pnv_get_proc_freq+0x20/0x50
[15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400
[15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610
[15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0
[15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70
[15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0
[15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110
[15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c
[15579.004016] Instruction dump:
[15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000
60000000
[15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001
4082fff0 7c2004ac
[15604.648202] INFO: rcu_sched self-detected stall on CPU
[15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1005652
[15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308)
[15604.648385] NMI backtrace for cpu 60
[15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15604.648491] Call Trace:
[15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15604.648581] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15604.648647] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15604.648728] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15604.648793] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15604.648858] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15604.648924] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15604.648988] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15604.649052] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15604.649118] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15604.649183] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15604.649248] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15604.649369] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180
[15604.649435] LR = smp_call_function_single+0x110/0x180
[15604.649530] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15604.649595] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15604.649770] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15604.650002] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15604.650084] LR = smp_call_function_many+0x324/0x450
[15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15604.650235] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15604.650301] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15604.650366] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15667.666494] INFO: rcu_sched self-detected stall on CPU
[15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0
softirq=300022/300022 fqs=1012258
[15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457)
[15667.666675] NMI backtrace for cpu 60
[15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L
4.15.0-15-generic #16-Ubuntu
[15667.666781] Call Trace:
[15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4
(unreliable)
[15667.666871] [c000201a1da96b40] [c000000000cf4d48]
nmi_cpu_backtrace+0x1f8/0x200
[15667.666937] [c000201a1da96bd0] [c000000000cf4ee8]
nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15667.667002] [c000201a1da96c60] [c00000000002f2d8]
arch_trigger_cpumask_backtrace+0x28/0x40
[15667.667086] [c000201a1da96c80] [c0000000001a913c]
rcu_dump_cpu_stacks+0xf4/0x158
[15667.667151] [c000201a1da96cd0] [c0000000001a81e8]
rcu_check_callbacks+0x8e8/0xb40
[15667.667216] [c000201a1da96e00] [c0000000001b64a8]
update_process_times+0x48/0x90
[15667.667280] [c000201a1da96e30] [c0000000001ce1f4]
tick_sched_handle.isra.5+0x34/0xd0
[15667.667344] [c000201a1da96e60] [c0000000001ce2f0]
tick_sched_timer+0x60/0xe0
[15667.667409] [c000201a1da96ea0] [c0000000001b7054]
__hrtimer_run_queues+0x144/0x370
[15667.667474] [c000201a1da96f20] [c0000000001b7fac]
hrtimer_interrupt+0xfc/0x350
[15667.667539] [c000201a1da96ff0] [c0000000000248f0]
__timer_interrupt+0x90/0x260
[15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15667.667660] [c000201a1da97070] [c000000000009014]
decrementer_common+0x114/0x120
[15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180
[15667.667727] LR = smp_call_function_single+0x110/0x180
[15667.667821] [c000201a1da973d0] [c0000000001d55e0]
smp_call_function_any+0x180/0x250
[15667.667886] [c000201a1da97430] [c000000000acd3e8]
gpstate_timer_handler+0x1e8/0x580
[15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15667.668061] [c000201a1da975d0] [c0000000001b4bf8]
run_timer_softirq+0x1e8/0x270
[15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15667.668292] [c000201a1da977a0] [c000000000009014]
decrementer_common+0x114/0x120
[15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15667.668358] LR = smp_call_function_many+0x324/0x450
[15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15667.668524] [c000201a1da97b30] [c0000000003a1120]
change_huge_pmd+0xe0/0x270
[15667.668589] [c000201a1da97ba0] [c000000000349278]
change_protection_range+0xb88/0xe40
[15667.668654] [c000201a1da97cf0] [c0000000003496c0]
mprotect_fixup+0x140/0x340
[15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
```
Per feedback from Vaidy, this currently appears to NOT be a firmware
problem. This seems to be a kernel synchronization issue leading to a
dead lock.
-------
Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is
currently being tested.
-------
Fix upstream in 4.17-rc3
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188
cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer
interrupt
Posted to stable as well.
Mirroring to Launchpad for Canonical to pull in commit.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1768898/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp