On Mon, Jun 24, 2019 at 11:59:48AM -0000, bugproxy wrote: > ------- Comment From hbath...@in.ibm.com 2019-06-24 07:49 EDT------- > Thanks for the change. With it, try-restart is being triggered for > kdump-tools service after CPU add operation but systemd reported > failure with below logs: > > Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service. > Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture > service... > Jun 24 06:47:06 ubuntu kdump-tools[2023]: Starting kdump-tools: * Creating > symlink /var/lib/kdump/vmlinuz > Jun 24 06:47:06 ubuntu kdump-tools[2023]: * Creating symlink > /var/lib/kdump/initrd.img > Jun 24 06:47:06 ubuntu kdump-tools[2023]: Modified > cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root > ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, > code=killed, status=15/TERM > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result > 'signal'. > Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service. > Jun 24 06:47:06 ubuntu systemd[1]: Starting Kernel crash dump capture > service... > Jun 24 06:47:06 ubuntu kdump-tools[2071]: Starting kdump-tools: * Creating > symlink /var/lib/kdump/vmlinuz > Jun 24 06:47:06 ubuntu kdump-tools[2071]: * Creating symlink > /var/lib/kdump/initrd.img > Jun 24 06:47:06 ubuntu kdump-tools[2071]: Modified > cmdline:BOOT_IMAGE=/vmlinux-5.0.0-17-generic root=/dev/mapper/ubuntu--vg-root > ro systemd.unit=kdump-tools-dump.service maxcpus=1 irqpo > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Main process exited, > code=killed, status=15/TERM > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result > 'signal'. > Jun 24 06:47:06 ubuntu systemd[1]: Stopped Kernel crash dump capture service. > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Start request > repeated too quickly. > Jun 24 06:47:06 ubuntu systemd[1]: kdump-tools.service: Failed with result > 'signal'. > Jun 24 06:47:06 ubuntu systemd[1]: Failed to start Kernel crash dump capture > service. > > --- > Looks like a ratelimit issue with systemd. Is there some systemd option to > workaround it? > > I am running the below command on a PowerVM machine: > > # drmgr -c cpu -r -q 1 (to remove a core) > # drmgr -c cpu -a -q 1 (to add it back -> this triggers 8 CPU online udev > events as SMT is 8) > > To conclude, udev rule alone is not sufficient. Need a way to address the > multiple > requests at once..
There are these systemd options, which default to a burst limit of 5 restart in the interval of 10s. StartLimitIntervalSec=interval, StartLimitBurst=burst One other option that I prefer, howerver, is resetting the start rate limit counter by using systemctl reset-failed kdump-tools.service on the udev rule. Can you try that? Thanks. Cascardo. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to makedumpfile in Ubuntu. https://bugs.launchpad.net/bugs/1828596 Title: kdump fails when crash is triggered after DLPAR cpu add operation Status in The Ubuntu-power-systems project: Confirmed Status in kexec-tools package in Ubuntu: Invalid Status in makedumpfile package in Ubuntu: Fix Released Status in kexec-tools source package in Xenial: New Status in makedumpfile source package in Xenial: New Status in kexec-tools source package in Bionic: New Status in makedumpfile source package in Bionic: New Status in kexec-tools source package in Cosmic: New Status in makedumpfile source package in Cosmic: New Status in kexec-tools source package in Disco: New Status in makedumpfile source package in Disco: New Status in kexec-tools source package in Eoan: Invalid Status in makedumpfile source package in Eoan: Fix Released Bug description: [Impact] After a CPU add/hotplug operation on Power systems, kdump will fail after a crash. The kdump kernel needs to be reloaded after a CPU add/hotplug. [Test case] Do CPU add/hotplug, trigger a crash, and check for a successful kdump. [Regression potential] Multiple reloads caused by multiple sequential CPU adds may cause spurious log results, and systemd may fail to properly reload the kdump kernel. == Comment: #0 - Hari Krishna Bathini - 2019-05-10 05:55:40 == ---Problem Description--- kdump fails when crash is triggered after CPU add operation. Machine Type = na ---System Hang--- Crashed in early boot process of kdump kernel after crash Had to issue system reset from HMC to reclaim ---Steps to Reproduce--- 1. Configure kdump. 2. Add cpu from HMC. 3. Trigger crash. 4. Machine hangs after crash as below: --- [169250.213166] IPI complete [169250.234331] kexec: Starting switchover sequence. I'm in purgatory --- STRUCK HERE --- ---uname output--- na ---Debugger--- A debugger is not configured == Comment: #1 - Hari Krishna Bathini - 2019-05-10 05:56:46 == The problem is, kexec udev rule to restart kdump-tools service - when a core is added, is not being triggered. The old DT created by kexec (before the core is added) is being used by KDump Kernel. So, when system crashes on a thread from the added core(s), KDump kernel is failing to get the 'boot_cpuid' and eventually failing to boot.. == Comment: #2 - Hari Krishna Bathini - 2019-05-10 06:02:27 == The udev rule when CPU is added is not triggered because ppc64 does not eject add/remove event when a CPU is hot added/removed. It only ejects online/offline event to user space when CPU is hot added/removed. So, the below udev rules are never triggered when needed: SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart kdump-tools.service" SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart kdump-tools.service" Also, with how CPU hot add & remove are handled in ppc64, a udev trigger to reload kdump after CPU is hot removed is NOT necessary. So, fix the CPU hot add case by updating the udev rule and drop the udev rule meant for CPU hot remove in the kdump udev rules file: SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try- restart kdump-tools.service" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1828596/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp