** Description changed:

+ [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel
+ Crashdump Files Generated on the Latest HWE Kernel
+ 
+ Note: Work in progress
+ 
+ [Impact]
+ 
+ The current versions of Makedumpfile and Crash in the -updates pocket on
+ Noble do not support the latest hardware enablement kernel for that
+ platform, which is 6.14. There are several architecture-dependent and
+ kernel flavor-dependent behaviours that I will outline below, but the
+ steps to reproduce are the same.
+ 
+ Reproducer steps:
+ -----------------
+ 
+ Boot into a hardware enablement kernel. For example, on arm64 use the
+ 6.14.0-1008-nvidia-64k kernel:
+ 
+ KERNEL_VERSION=6.14.0-1008-nvidia-64k
+ DISTRO=noble
+ 
+ sudo apt update
+ sudo apt install ubuntu-dbgsym-keyring
+ echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe 
multiverse
+ deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe 
multiverse | \
+   sudo tee /etc/apt/sources.list.d/ddebs.list
+ sudo apt update
+ sudo apt install linux-image-${KERNEL_VERSION}
+ sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym
+ 
+ Modify grub's cmdline to specify a crashkernel: 
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar
+ sudo update-grub
+ sudo apt install kexec-tools kdump-tools crash makedumpfile
+ sudo systemctl enable kdump-tools
+ sudo systemctl start kdump-tools
+ sudo reboot
+ 
+ echo c | sudo tee /proc/sysrq-trigger
+ 
+ Results on Arm64
+ ----------------
+ 
+ After the machine recovers,
+ 
+ crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
+ /var/crash/<dump-dir>/<dump-file>
+ 
+ crash 8.0.4
+ Copyright (C) 2002-2022  Red Hat, Inc.
+ ...
+ For help, type "help".
+ Type "apropos word" to search for commands related to "word"...
+ 
+ please wait... (gathering task table data)
+ crash: page excluded: kernel virtual address: ffff07ffa042d8e0  type: 
"xa_node.slots[off]"
+ 
+ Results on amd64
+ ----------------
+ 
+ On an amd64 machine, using a kernel such as linux-
+ image-6.14.0-29-generic results in crash failing to open. No error is
+ printed but we don't obtain the prompt:
+ 
+ crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
+ /var/crash/202509112049/dump.202509112049
+ 
+ crash 8.0.4
+ ...
+ For help, type "help".
+ Type "apropos word" to search for commands related to "word"...
+ 
+ # Program exits and no prompt is presented
+ 
+ 
+ At the time of writing, we have identified that on the Makedumpfile at least 
two commits are needed:
+ [1] 
https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
+ [2] 
https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda
+ 
+ These are patches to compensate for a change in the kernel's mapping of
+ memory. Using the patched Makedumpfile helps, but it is not sufficient.
+ Including the patches in Makedumpfile (or using the tip of upstream
+ master), but opening with the currently distributed crash results in
+ other non-fatal errors:
+ 
+ eg. Patched Makedumpfile with crash 8.0.4 on Arm64:
+ crash 8.0.4
+ ...
+ WARNING: cannot determine starting stack frame for task ffffd574e21b4800
+ 
+ WARNING: cannot determine starting stack frame for task ffff07ff83296300
+ 
+ WARNING: cannot determine starting stack frame for task ffff07ff83293f80
+ 
+ WARNING: cannot determine starting stack frame for task ffff07ff83a04700
+ 
+ WARNING: cannot determine starting stack frame for task ffff08010507c400
+       KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
+     DUMPFILE: /var/crash/patched_mdf/dump.202509191531  [PARTIAL DUMP]
+         CPUS: 128 [OFFLINE: 127]
+         DATE: Thu Jan  1 00:00:00 UTC 1970
+       UPTIME: 00:13:38
+ LOAD AVERAGE: 0.12, 0.16, 0.10
+        TASKS: 1573
+     NODENAME: penguru
+      RELEASE: 6.14.0-1008-nvidia-64k
+      VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
+      MACHINE: aarch64  (unknown Mhz)
+       MEMORY: 63.8 GB
+        PANIC: "Kernel panic - not syncing: sysrq triggered crash"
+          PID: 7886
+      COMMAND: "tee"
+         TASK: ffff08010507c400  [THREAD_INFO: ffff08010507c400]
+          CPU: 85
+        STATE: TASK_RUNNING (PANIC)
+ 
+ On Amd64, crash still fails to open.
+ 
+ Therefore, in addition to the above Makedumpfile commits, crash requires
+ some patching. With the above two commits to Makedumpfile I did a bisect
+ on crash on amd64 and arm64.
+ 
+ On the amd64 crash side, I have identified that [3] applied in isolation 
(cherry-picked) is sufficient on amd64
+ [3] 
https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad
+ 
+ I have also found that [4] applied in isolation (cherry-picked) resolves the 
issue on arm64 hardware in testflinger (using the machine agent penguru)
+ [4] 
https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7
+ 
+ However, this was insufficient to open a customer supplied crash dump.
+ To open the customer's dumpfile, the bisect pointed to
+ https://github.com/crash-
+ utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the
+ first commit where everything works as expected. Unfortunately, this
+ does not cleanly apply and some work needs to be done to determine what
+ additional patches and / or custom modifications are needed.
+ 
+ [Test Plan]
+ 
+ * Ensure that with the proposed combination of Makedumpfile and crash is
+ capable of generating and subsequently opening crashdumps on the latest
+ HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW: 6.14 and
+ 6.18, respectively). If bugs are found in generating and reading
+ crashdumps on the HWE kernel on other architectures (s390x, etc.), this
+ test plan can be expanded to include those.
+ 
+ [Where Problems Could Occur]
+ * Crash and Makedumpfile are designed to be backwards-compatible, so the risk 
of regression is low - however, not zero. This is why it will be important to 
ensure that the proposed combination of Makedumpfile and crash does not break 
existing environments - for example the GA kernel
+ 
+ * The matrix of hardware and kernel versions (including derivative /
+ cloud kernels) to test again is extensive. It's possible that the
+ commits identified to solve the known problems will not be
+ comprehensive. For example, a different cpu architecture with a
+ different kernel may require additional commits to be backported.
+ 
+ [Other Info]
+ 
+ * Support/SEG are currently having conversations with the kernel team
+ about the potential to proactively SRU / MRE the latest upstream crash
+ version, and potentially Makedumpfile as well, alongside -hwe kernel
+ releases to avoid this sort of regression in the future. Though, we
+ understand this would require an SRUExceptionPolicy to be approved and
+ published.
+ 
+ 
+ 
+ Original Description:
+ =====================
+ 
  24.04 LTS,
  Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 
16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  
  Problem Description:
  crash utility is crashing (error code 1) when attempting to analyze kernel 
crash dumps.
  
  Setup kdump & generated kernel panic using “echo 1 >
  /proc/sys/kernel/sysrq” but, crash cannot access it:
  
  # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  dump.202509161821
  
  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  Copyright (C) 2015, 2021  VMware, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
-  
- GNU gdb (GDB) 10.2                             
+ 
+ GNU gdb (GDB) 10.2
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-pc-linux-gnu".
  Type "show configuration" for configuration details.
  Find the GDB manual and other documentation resources online at:
-     <http://www.gnu.org/software/gdb/documentation/>.
+     <http://www.gnu.org/software/gdb/documentation/>.
  
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  
  # echo $?
  1
  
  running as root user and file is readable fine:
- 
  
  $ :/var/crash/202509161821# ls -l
  total 299144
  -rw------- 1 root whoopsie    119627 Sep 16 18:21 dmesg.202509161821
  -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821
- 
  
  symbol file is there:
  
  # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic*
  -rw-r--r-- 1 root root 450705920 Aug 14 18:02 
/usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  
  tail of strace:
  
  14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661360 write(1, "\n", 1
  )       = 1 <0.000119>
  14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010>
  14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 
<0.000011>
  14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such 
file or directory) <0.002921>
  14:06:20.664817 exit_group(1)           = ?
  14:06:20.690105 +++ exited with 1 +++
- 
  
  full crash strace https://filebin.net/custom-bin/crash.strace.1
  
  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: crash 8.0.4-1ubuntu2
  ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8
  Uname: Linux 6.14.0-29-generic x86_64
  ApportVersion: 2.28.1-0ubuntu3.8
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Thu Sep 18 20:21:26 2025
  InstallationDate: Installed on 2025-09-04 (14 days ago)
  InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 
(20250215)
  ProcEnviron:
-  LANG=en_US.UTF-8
-  PATH=(custom, no user)
-  SHELL=/bin/bash
-  TERM=xterm-256color
+  LANG=en_US.UTF-8
+  PATH=(custom, no user)
+  SHELL=/bin/bash
+  TERM=xterm-256color
  SourcePackage: crash
  UpgradeStatus: No upgrade log present (probably fresh install)

** Description changed:

- [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel
- Crashdump Files Generated on the Latest HWE Kernel
- 
- Note: Work in progress
+ Note: SRU is a work in progress as we need to figure out the smallest
+ required subset of commits on the crash-side of things to resolve the
+ issue in environments where this is known to occur
+ 
+ Note: Original description is at the bottom of this report
  
  [Impact]
  
  The current versions of Makedumpfile and Crash in the -updates pocket on
  Noble do not support the latest hardware enablement kernel for that
  platform, which is 6.14. There are several architecture-dependent and
  kernel flavor-dependent behaviours that I will outline below, but the
  steps to reproduce are the same.
  
  Reproducer steps:
  -----------------
  
  Boot into a hardware enablement kernel. For example, on arm64 use the
  6.14.0-1008-nvidia-64k kernel:
  
  KERNEL_VERSION=6.14.0-1008-nvidia-64k
  DISTRO=noble
  
  sudo apt update
  sudo apt install ubuntu-dbgsym-keyring
  echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe 
multiverse
  deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe 
multiverse | \
-   sudo tee /etc/apt/sources.list.d/ddebs.list
+   sudo tee /etc/apt/sources.list.d/ddebs.list
  sudo apt update
  sudo apt install linux-image-${KERNEL_VERSION}
  sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym
  
  Modify grub's cmdline to specify a crashkernel: 
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar
  sudo update-grub
  sudo apt install kexec-tools kdump-tools crash makedumpfile
  sudo systemctl enable kdump-tools
  sudo systemctl start kdump-tools
  sudo reboot
  
  echo c | sudo tee /proc/sysrq-trigger
  
  Results on Arm64
  ----------------
  
  After the machine recovers,
  
  crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
  /var/crash/<dump-dir>/<dump-file>
  
  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  
  please wait... (gathering task table data)
  crash: page excluded: kernel virtual address: ffff07ffa042d8e0  type: 
"xa_node.slots[off]"
  
  Results on amd64
  ----------------
  
  On an amd64 machine, using a kernel such as linux-
  image-6.14.0-29-generic results in crash failing to open. No error is
  printed but we don't obtain the prompt:
  
  crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  /var/crash/202509112049/dump.202509112049
  
  crash 8.0.4
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  
  # Program exits and no prompt is presented
  
- 
  At the time of writing, we have identified that on the Makedumpfile at least 
two commits are needed:
  [1] 
https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
  [2] 
https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda
  
  These are patches to compensate for a change in the kernel's mapping of
  memory. Using the patched Makedumpfile helps, but it is not sufficient.
  Including the patches in Makedumpfile (or using the tip of upstream
  master), but opening with the currently distributed crash results in
  other non-fatal errors:
  
  eg. Patched Makedumpfile with crash 8.0.4 on Arm64:
  crash 8.0.4
  ...
  WARNING: cannot determine starting stack frame for task ffffd574e21b4800
  
  WARNING: cannot determine starting stack frame for task ffff07ff83296300
  
  WARNING: cannot determine starting stack frame for task ffff07ff83293f80
  
  WARNING: cannot determine starting stack frame for task ffff07ff83a04700
  
  WARNING: cannot determine starting stack frame for task ffff08010507c400
-       KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
-     DUMPFILE: /var/crash/patched_mdf/dump.202509191531  [PARTIAL DUMP]
-         CPUS: 128 [OFFLINE: 127]
-         DATE: Thu Jan  1 00:00:00 UTC 1970
-       UPTIME: 00:13:38
+       KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
+     DUMPFILE: /var/crash/patched_mdf/dump.202509191531  [PARTIAL DUMP]
+         CPUS: 128 [OFFLINE: 127]
+         DATE: Thu Jan  1 00:00:00 UTC 1970
+       UPTIME: 00:13:38
  LOAD AVERAGE: 0.12, 0.16, 0.10
-        TASKS: 1573
-     NODENAME: penguru
-      RELEASE: 6.14.0-1008-nvidia-64k
-      VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
-      MACHINE: aarch64  (unknown Mhz)
-       MEMORY: 63.8 GB
-        PANIC: "Kernel panic - not syncing: sysrq triggered crash"
-          PID: 7886
-      COMMAND: "tee"
-         TASK: ffff08010507c400  [THREAD_INFO: ffff08010507c400]
-          CPU: 85
-        STATE: TASK_RUNNING (PANIC)
+        TASKS: 1573
+     NODENAME: penguru
+      RELEASE: 6.14.0-1008-nvidia-64k
+      VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
+      MACHINE: aarch64  (unknown Mhz)
+       MEMORY: 63.8 GB
+        PANIC: "Kernel panic - not syncing: sysrq triggered crash"
+          PID: 7886
+      COMMAND: "tee"
+         TASK: ffff08010507c400  [THREAD_INFO: ffff08010507c400]
+          CPU: 85
+        STATE: TASK_RUNNING (PANIC)
  
  On Amd64, crash still fails to open.
  
  Therefore, in addition to the above Makedumpfile commits, crash requires
  some patching. With the above two commits to Makedumpfile I did a bisect
  on crash on amd64 and arm64.
  
  On the amd64 crash side, I have identified that [3] applied in isolation 
(cherry-picked) is sufficient on amd64
  [3] 
https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad
  
  I have also found that [4] applied in isolation (cherry-picked) resolves the 
issue on arm64 hardware in testflinger (using the machine agent penguru)
  [4] 
https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7
  
  However, this was insufficient to open a customer supplied crash dump.
  To open the customer's dumpfile, the bisect pointed to
  https://github.com/crash-
  utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the
  first commit where everything works as expected. Unfortunately, this
  does not cleanly apply and some work needs to be done to determine what
  additional patches and / or custom modifications are needed.
  
  [Test Plan]
  
  * Ensure that with the proposed combination of Makedumpfile and crash is
  capable of generating and subsequently opening crashdumps on the latest
  HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW: 6.14 and
  6.18, respectively). If bugs are found in generating and reading
  crashdumps on the HWE kernel on other architectures (s390x, etc.), this
  test plan can be expanded to include those.
  
  [Where Problems Could Occur]
  * Crash and Makedumpfile are designed to be backwards-compatible, so the risk 
of regression is low - however, not zero. This is why it will be important to 
ensure that the proposed combination of Makedumpfile and crash does not break 
existing environments - for example the GA kernel
  
  * The matrix of hardware and kernel versions (including derivative /
  cloud kernels) to test again is extensive. It's possible that the
  commits identified to solve the known problems will not be
  comprehensive. For example, a different cpu architecture with a
  different kernel may require additional commits to be backported.
  
  [Other Info]
  
  * Support/SEG are currently having conversations with the kernel team
  about the potential to proactively SRU / MRE the latest upstream crash
  version, and potentially Makedumpfile as well, alongside -hwe kernel
  releases to avoid this sort of regression in the future. Though, we
  understand this would require an SRUExceptionPolicy to be approved and
  published.
- 
- 
  
  Original Description:
  =====================
  
  24.04 LTS,
  Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 
16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  
  Problem Description:
  crash utility is crashing (error code 1) when attempting to analyze kernel 
crash dumps.
  
  Setup kdump & generated kernel panic using “echo 1 >
  /proc/sys/kernel/sysrq” but, crash cannot access it:
  
  # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  dump.202509161821
  
  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  Copyright (C) 2015, 2021  VMware, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.
  
  GNU gdb (GDB) 10.2
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-pc-linux-gnu".
  Type "show configuration" for configuration details.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.
  
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  
  # echo $?
  1
  
  running as root user and file is readable fine:
  
  $ :/var/crash/202509161821# ls -l
  total 299144
  -rw------- 1 root whoopsie    119627 Sep 16 18:21 dmesg.202509161821
  -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821
  
  symbol file is there:
  
  # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic*
  -rw-r--r-- 1 root root 450705920 Aug 14 18:02 
/usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  
  tail of strace:
  
  14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661360 write(1, "\n", 1
  )       = 1 <0.000119>
  14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010>
  14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 
<0.000011>
  14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such 
file or directory) <0.002921>
  14:06:20.664817 exit_group(1)           = ?
  14:06:20.690105 +++ exited with 1 +++
  
  full crash strace https://filebin.net/custom-bin/crash.strace.1
  
  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: crash 8.0.4-1ubuntu2
  ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8
  Uname: Linux 6.14.0-29-generic x86_64
  ApportVersion: 2.28.1-0ubuntu3.8
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Thu Sep 18 20:21:26 2025
  InstallationDate: Installed on 2025-09-04 (14 days ago)
  InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 
(20250215)
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
  SourcePackage: crash
  UpgradeStatus: No upgrade log present (probably fresh install)

** Summary changed:

- Crash utility exits with error code 1 when analyzing kernel crash
+ [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel 
Crashdump Files Generated on the Latest HWE Kernel

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to crash in Ubuntu.
https://bugs.launchpad.net/bugs/2125145

Title:
  [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening
  Kernel Crashdump Files Generated on the Latest HWE Kernel

Status in crash package in Ubuntu:
  Confirmed
Status in makedumpfile package in Ubuntu:
  Confirmed

Bug description:
  Note: SRU is a work in progress as we need to figure out the smallest
  required subset of commits on the crash-side of things to resolve the
  issue in environments where this is known to occur

  Note: Original description is at the bottom of this report

  [Impact]

  The current versions of Makedumpfile and Crash in the -updates pocket
  on Noble do not support the latest hardware enablement kernel for that
  platform, which is 6.14. There are several architecture-dependent and
  kernel flavor-dependent behaviours that I will outline below, but the
  steps to reproduce are the same.

  Reproducer steps:
  -----------------

  Boot into a hardware enablement kernel. For example, on arm64 use the
  6.14.0-1008-nvidia-64k kernel:

  KERNEL_VERSION=6.14.0-1008-nvidia-64k
  DISTRO=noble

  sudo apt update
  sudo apt install ubuntu-dbgsym-keyring
  echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe 
multiverse
  deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe 
multiverse | \
    sudo tee /etc/apt/sources.list.d/ddebs.list
  sudo apt update
  sudo apt install linux-image-${KERNEL_VERSION}
  sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym

  Modify grub's cmdline to specify a crashkernel: 
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar
  sudo update-grub
  sudo apt install kexec-tools kdump-tools crash makedumpfile
  sudo systemctl enable kdump-tools
  sudo systemctl start kdump-tools
  sudo reboot

  echo c | sudo tee /proc/sysrq-trigger

  Results on Arm64
  ----------------

  After the machine recovers,

  crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
  /var/crash/<dump-dir>/<dump-file>

  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  please wait... (gathering task table data)
  crash: page excluded: kernel virtual address: ffff07ffa042d8e0  type: 
"xa_node.slots[off]"

  Results on amd64
  ----------------

  On an amd64 machine, using a kernel such as linux-
  image-6.14.0-29-generic results in crash failing to open. No error is
  printed but we don't obtain the prompt:

  crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  /var/crash/202509112049/dump.202509112049

  crash 8.0.4
  ...
  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  # Program exits and no prompt is presented

  At the time of writing, we have identified that on the Makedumpfile at least 
two commits are needed:
  [1] 
https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1
  [2] 
https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda

  These are patches to compensate for a change in the kernel's mapping
  of memory. Using the patched Makedumpfile helps, but it is not
  sufficient. Including the patches in Makedumpfile (or using the tip of
  upstream master), but opening with the currently distributed crash
  results in other non-fatal errors:

  eg. Patched Makedumpfile with crash 8.0.4 on Arm64:
  crash 8.0.4
  ...
  WARNING: cannot determine starting stack frame for task ffffd574e21b4800

  WARNING: cannot determine starting stack frame for task
  ffff07ff83296300

  WARNING: cannot determine starting stack frame for task
  ffff07ff83293f80

  WARNING: cannot determine starting stack frame for task
  ffff07ff83a04700

  WARNING: cannot determine starting stack frame for task ffff08010507c400
        KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k
      DUMPFILE: /var/crash/patched_mdf/dump.202509191531  [PARTIAL DUMP]
          CPUS: 128 [OFFLINE: 127]
          DATE: Thu Jan  1 00:00:00 UTC 1970
        UPTIME: 00:13:38
  LOAD AVERAGE: 0.12, 0.16, 0.10
         TASKS: 1573
      NODENAME: penguru
       RELEASE: 6.14.0-1008-nvidia-64k
       VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025
       MACHINE: aarch64  (unknown Mhz)
        MEMORY: 63.8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 7886
       COMMAND: "tee"
          TASK: ffff08010507c400  [THREAD_INFO: ffff08010507c400]
           CPU: 85
         STATE: TASK_RUNNING (PANIC)

  On Amd64, crash still fails to open.

  Therefore, in addition to the above Makedumpfile commits, crash
  requires some patching. With the above two commits to Makedumpfile I
  did a bisect on crash on amd64 and arm64.

  On the amd64 crash side, I have identified that [3] applied in isolation 
(cherry-picked) is sufficient on amd64
  [3] 
https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad

  I have also found that [4] applied in isolation (cherry-picked) resolves the 
issue on arm64 hardware in testflinger (using the machine agent penguru)
  [4] 
https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7

  However, this was insufficient to open a customer supplied crash dump.
  To open the customer's dumpfile, the bisect pointed to
  https://github.com/crash-
  utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the
  first commit where everything works as expected. Unfortunately, this
  does not cleanly apply and some work needs to be done to determine
  what additional patches and / or custom modifications are needed.

  [Test Plan]

  * Ensure that with the proposed combination of Makedumpfile and crash
  is capable of generating and subsequently opening crashdumps on the
  latest HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW:
  6.14 and 6.18, respectively). If bugs are found in generating and
  reading crashdumps on the HWE kernel on other architectures (s390x,
  etc.), this test plan can be expanded to include those.

  [Where Problems Could Occur]
  * Crash and Makedumpfile are designed to be backwards-compatible, so the risk 
of regression is low - however, not zero. This is why it will be important to 
ensure that the proposed combination of Makedumpfile and crash does not break 
existing environments - for example the GA kernel

  * The matrix of hardware and kernel versions (including derivative /
  cloud kernels) to test again is extensive. It's possible that the
  commits identified to solve the known problems will not be
  comprehensive. For example, a different cpu architecture with a
  different kernel may require additional commits to be backported.

  [Other Info]

  * Support/SEG are currently having conversations with the kernel team
  about the potential to proactively SRU / MRE the latest upstream crash
  version, and potentially Makedumpfile as well, alongside -hwe kernel
  releases to avoid this sort of regression in the future. Though, we
  understand this would require an SRUExceptionPolicy to be approved and
  published.

  Original Description:
  =====================

  24.04 LTS,
  Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 
16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  Problem Description:
  crash utility is crashing (error code 1) when attempting to analyze kernel 
crash dumps.

  Setup kdump & generated kernel panic using “echo 1 >
  /proc/sys/kernel/sysrq” but, crash cannot access it:

  # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic
  dump.202509161821

  crash 8.0.4
  Copyright (C) 2002-2022  Red Hat, Inc.
  Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
  Copyright (C) 1999-2006  Hewlett-Packard Co
  Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
  Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
  Copyright (C) 2005, 2011, 2020-2022  NEC Corporation
  Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
  Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
  Copyright (C) 2015, 2021  VMware, Inc.
  This program is free software, covered by the GNU General Public License,
  and you are welcome to change it and/or distribute copies of it under
  certain conditions.  Enter "help copying" to see the conditions.
  This program has absolutely no warranty.  Enter "help warranty" for details.

  GNU gdb (GDB) 10.2
  Copyright (C) 2021 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-pc-linux-gnu".
  Type "show configuration" for configuration details.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...

  # echo $?
  1

  running as root user and file is readable fine:

  $ :/var/crash/202509161821# ls -l
  total 299144
  -rw------- 1 root whoopsie    119627 Sep 16 18:21 dmesg.202509161821
  -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821

  symbol file is there:

  # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic*
  -rw-r--r-- 1 root root 450705920 Aug 14 18:02 
/usr/lib/debug/boot/vmlinux-6.14.0-29-generic

  tail of strace:

  14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], 
sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 
<0.000008>
  14:06:20.661360 write(1, "\n", 1
  )       = 1 <0.000119>
  14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010>
  14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 
<0.000011>
  14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such 
file or directory) <0.002921>
  14:06:20.664817 exit_group(1)           = ?
  14:06:20.690105 +++ exited with 1 +++

  full crash strace https://filebin.net/custom-bin/crash.strace.1

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: crash 8.0.4-1ubuntu2
  ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8
  Uname: Linux 6.14.0-29-generic x86_64
  ApportVersion: 2.28.1-0ubuntu3.8
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Thu Sep 18 20:21:26 2025
  InstallationDate: Installed on 2025-09-04 (14 days ago)
  InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 
(20250215)
  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
  SourcePackage: crash
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/crash/+bug/2125145/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to