Public bug reported:

== Comment: #0 - Hari Krishna Bathini - 2019-05-10 05:55:40 ==
---Problem Description---
kdump fails when crash is triggered after CPU add operation.
 
Machine Type = na 
 
---System Hang---
 Crashed in early boot process of kdump kernel after crash

Had to issue system reset from HMC to reclaim
 
---Steps to Reproduce---
 1. Configure kdump.
2. Add cpu from HMC.
3. Trigger crash.
4. Machine hangs after crash as below:

---
[169250.213166] IPI complete
[169250.234331] kexec: Starting switchover sequence.
I'm in purgatory
                             --- STRUCK HERE ---
 
---uname output---
na
 
---Debugger---
A debugger is not configured

== Comment: #1 - Hari Krishna Bathini  - 2019-05-10 05:56:46 ==
The problem is, kexec udev rule to restart kdump-tools service - when a core is 
added,
is not being triggered. The old DT created by kexec (before the core is added)
is being used by KDump Kernel. So, when system crashes on a thread from
the added core(s), KDump kernel is failing to get the 'boot_cpuid' and
eventually failing to boot..

== Comment: #2 - Hari Krishna Bathini - 2019-05-10 06:02:27 ==
The udev rule when CPU is added is not triggered because ppc64 does not
eject add/remove event when a CPU is hot added/removed. It only ejects
online/offline event to user space when CPU is hot added/removed.

So, the below udev rules are never triggered when needed:

SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"
SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"

Also, with how CPU hot add & remove are handled in ppc64, a udev trigger
to reload kdump after CPU is hot removed is NOT necessary. So, fix the CPU
hot add case by updating the udev rule and drop the udev rule meant for CPU
hot remove in the kdump udev rules file:


SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"

** Affects: kexec-tools (Ubuntu)
     Importance: Undecided
     Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
         Status: New


** Tags: architecture-ppc64le bugnameltc-177551 severity-high 
targetmilestone-inin---

** Tags added: architecture-ppc64le bugnameltc-177551 severity-high
targetmilestone-inin---

** Changed in: ubuntu
     Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage 
(ubuntu-power-triage)

** Package changed: ubuntu => kexec-tools (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to kexec-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1828596

Title:
  kdump fails when crash is triggered after DLPAR cpu add operation

Status in kexec-tools package in Ubuntu:
  New

Bug description:
  == Comment: #0 - Hari Krishna Bathini - 2019-05-10 05:55:40 ==
  ---Problem Description---
  kdump fails when crash is triggered after CPU add operation.
   
  Machine Type = na 
   
  ---System Hang---
   Crashed in early boot process of kdump kernel after crash

  Had to issue system reset from HMC to reclaim
   
  ---Steps to Reproduce---
   1. Configure kdump.
  2. Add cpu from HMC.
  3. Trigger crash.
  4. Machine hangs after crash as below:

  ---
  [169250.213166] IPI complete
  [169250.234331] kexec: Starting switchover sequence.
  I'm in purgatory
                               --- STRUCK HERE ---
   
  ---uname output---
  na
   
  ---Debugger---
  A debugger is not configured

  == Comment: #1 - Hari Krishna Bathini  - 2019-05-10 05:56:46 ==
  The problem is, kexec udev rule to restart kdump-tools service - when a core 
is added,
  is not being triggered. The old DT created by kexec (before the core is added)
  is being used by KDump Kernel. So, when system crashes on a thread from
  the added core(s), KDump kernel is failing to get the 'boot_cpuid' and
  eventually failing to boot..

  == Comment: #2 - Hari Krishna Bathini - 2019-05-10 06:02:27 ==
  The udev rule when CPU is added is not triggered because ppc64 does not
  eject add/remove event when a CPU is hot added/removed. It only ejects
  online/offline event to user space when CPU is hot added/removed.

  So, the below udev rules are never triggered when needed:

  SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"
  SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"

  Also, with how CPU hot add & remove are handled in ppc64, a udev trigger
  to reload kdump after CPU is hot removed is NOT necessary. So, fix the CPU
  hot add case by updating the udev rule and drop the udev rule meant for CPU
  hot remove in the kdump udev rules file:

  
  SUBSYSTEM=="cpu", ACTION=="online", PROGRAM="/bin/systemctl try-restart 
kdump-tools.service"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1828596/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to