** Description changed: + [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel + Crashdump Files Generated on the Latest HWE Kernel + + Note: Work in progress + + [Impact] + + The current versions of Makedumpfile and Crash in the -updates pocket on + Noble do not support the latest hardware enablement kernel for that + platform, which is 6.14. There are several architecture-dependent and + kernel flavor-dependent behaviours that I will outline below, but the + steps to reproduce are the same. + + Reproducer steps: + ----------------- + + Boot into a hardware enablement kernel. For example, on arm64 use the + 6.14.0-1008-nvidia-64k kernel: + + KERNEL_VERSION=6.14.0-1008-nvidia-64k + DISTRO=noble + + sudo apt update + sudo apt install ubuntu-dbgsym-keyring + echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe multiverse + deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe multiverse | \ + sudo tee /etc/apt/sources.list.d/ddebs.list + sudo apt update + sudo apt install linux-image-${KERNEL_VERSION} + sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym + + Modify grub's cmdline to specify a crashkernel: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar + sudo update-grub + sudo apt install kexec-tools kdump-tools crash makedumpfile + sudo systemctl enable kdump-tools + sudo systemctl start kdump-tools + sudo reboot + + echo c | sudo tee /proc/sysrq-trigger + + Results on Arm64 + ---------------- + + After the machine recovers, + + crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k + /var/crash/<dump-dir>/<dump-file> + + crash 8.0.4 + Copyright (C) 2002-2022 Red Hat, Inc. + ... + For help, type "help". + Type "apropos word" to search for commands related to "word"... + + please wait... (gathering task table data) + crash: page excluded: kernel virtual address: ffff07ffa042d8e0 type: "xa_node.slots[off]" + + Results on amd64 + ---------------- + + On an amd64 machine, using a kernel such as linux- + image-6.14.0-29-generic results in crash failing to open. No error is + printed but we don't obtain the prompt: + + crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic + /var/crash/202509112049/dump.202509112049 + + crash 8.0.4 + ... + For help, type "help". + Type "apropos word" to search for commands related to "word"... + + # Program exits and no prompt is presented + + + At the time of writing, we have identified that on the Makedumpfile at least two commits are needed: + [1] https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1 + [2] https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda + + These are patches to compensate for a change in the kernel's mapping of + memory. Using the patched Makedumpfile helps, but it is not sufficient. + Including the patches in Makedumpfile (or using the tip of upstream + master), but opening with the currently distributed crash results in + other non-fatal errors: + + eg. Patched Makedumpfile with crash 8.0.4 on Arm64: + crash 8.0.4 + ... + WARNING: cannot determine starting stack frame for task ffffd574e21b4800 + + WARNING: cannot determine starting stack frame for task ffff07ff83296300 + + WARNING: cannot determine starting stack frame for task ffff07ff83293f80 + + WARNING: cannot determine starting stack frame for task ffff07ff83a04700 + + WARNING: cannot determine starting stack frame for task ffff08010507c400 + KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k + DUMPFILE: /var/crash/patched_mdf/dump.202509191531 [PARTIAL DUMP] + CPUS: 128 [OFFLINE: 127] + DATE: Thu Jan 1 00:00:00 UTC 1970 + UPTIME: 00:13:38 + LOAD AVERAGE: 0.12, 0.16, 0.10 + TASKS: 1573 + NODENAME: penguru + RELEASE: 6.14.0-1008-nvidia-64k + VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025 + MACHINE: aarch64 (unknown Mhz) + MEMORY: 63.8 GB + PANIC: "Kernel panic - not syncing: sysrq triggered crash" + PID: 7886 + COMMAND: "tee" + TASK: ffff08010507c400 [THREAD_INFO: ffff08010507c400] + CPU: 85 + STATE: TASK_RUNNING (PANIC) + + On Amd64, crash still fails to open. + + Therefore, in addition to the above Makedumpfile commits, crash requires + some patching. With the above two commits to Makedumpfile I did a bisect + on crash on amd64 and arm64. + + On the amd64 crash side, I have identified that [3] applied in isolation (cherry-picked) is sufficient on amd64 + [3] https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad + + I have also found that [4] applied in isolation (cherry-picked) resolves the issue on arm64 hardware in testflinger (using the machine agent penguru) + [4] https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7 + + However, this was insufficient to open a customer supplied crash dump. + To open the customer's dumpfile, the bisect pointed to + https://github.com/crash- + utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the + first commit where everything works as expected. Unfortunately, this + does not cleanly apply and some work needs to be done to determine what + additional patches and / or custom modifications are needed. + + [Test Plan] + + * Ensure that with the proposed combination of Makedumpfile and crash is + capable of generating and subsequently opening crashdumps on the latest + HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW: 6.14 and + 6.18, respectively). If bugs are found in generating and reading + crashdumps on the HWE kernel on other architectures (s390x, etc.), this + test plan can be expanded to include those. + + [Where Problems Could Occur] + * Crash and Makedumpfile are designed to be backwards-compatible, so the risk of regression is low - however, not zero. This is why it will be important to ensure that the proposed combination of Makedumpfile and crash does not break existing environments - for example the GA kernel + + * The matrix of hardware and kernel versions (including derivative / + cloud kernels) to test again is extensive. It's possible that the + commits identified to solve the known problems will not be + comprehensive. For example, a different cpu architecture with a + different kernel may require additional commits to be backported. + + [Other Info] + + * Support/SEG are currently having conversations with the kernel team + about the potential to proactively SRU / MRE the latest upstream crash + version, and potentially Makedumpfile as well, alongside -hwe kernel + releases to avoid this sort of regression in the future. Though, we + understand this would require an SRUExceptionPolicy to be approved and + published. + + + + Original Description: + ===================== + 24.04 LTS, Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Problem Description: crash utility is crashing (error code 1) when attempting to analyze kernel crash dumps. Setup kdump & generated kernel panic using “echo 1 > /proc/sys/kernel/sysrq” but, crash cannot access it: # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic dump.202509161821 crash 8.0.4 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. - - GNU gdb (GDB) 10.2 + + GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: - <http://www.gnu.org/software/gdb/documentation/>. + <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... # echo $? 1 running as root user and file is readable fine: - $ :/var/crash/202509161821# ls -l total 299144 -rw------- 1 root whoopsie 119627 Sep 16 18:21 dmesg.202509161821 -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821 - symbol file is there: # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic* -rw-r--r-- 1 root root 450705920 Aug 14 18:02 /usr/lib/debug/boot/vmlinux-6.14.0-29-generic tail of strace: 14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661360 write(1, "\n", 1 ) = 1 <0.000119> 14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010> 14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 <0.000011> 14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such file or directory) <0.002921> 14:06:20.664817 exit_group(1) = ? 14:06:20.690105 +++ exited with 1 +++ - full crash strace https://filebin.net/custom-bin/crash.strace.1 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: crash 8.0.4-1ubuntu2 ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8 Uname: Linux 6.14.0-29-generic x86_64 ApportVersion: 2.28.1-0ubuntu3.8 Architecture: amd64 CasperMD5CheckResult: pass Date: Thu Sep 18 20:21:26 2025 InstallationDate: Installed on 2025-09-04 (14 days ago) InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 (20250215) ProcEnviron: - LANG=en_US.UTF-8 - PATH=(custom, no user) - SHELL=/bin/bash - TERM=xterm-256color + LANG=en_US.UTF-8 + PATH=(custom, no user) + SHELL=/bin/bash + TERM=xterm-256color SourcePackage: crash UpgradeStatus: No upgrade log present (probably fresh install)
** Description changed: - [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel - Crashdump Files Generated on the Latest HWE Kernel - - Note: Work in progress + Note: SRU is a work in progress as we need to figure out the smallest + required subset of commits on the crash-side of things to resolve the + issue in environments where this is known to occur + + Note: Original description is at the bottom of this report [Impact] The current versions of Makedumpfile and Crash in the -updates pocket on Noble do not support the latest hardware enablement kernel for that platform, which is 6.14. There are several architecture-dependent and kernel flavor-dependent behaviours that I will outline below, but the steps to reproduce are the same. Reproducer steps: ----------------- Boot into a hardware enablement kernel. For example, on arm64 use the 6.14.0-1008-nvidia-64k kernel: KERNEL_VERSION=6.14.0-1008-nvidia-64k DISTRO=noble sudo apt update sudo apt install ubuntu-dbgsym-keyring echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe multiverse deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe multiverse | \ - sudo tee /etc/apt/sources.list.d/ddebs.list + sudo tee /etc/apt/sources.list.d/ddebs.list sudo apt update sudo apt install linux-image-${KERNEL_VERSION} sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym Modify grub's cmdline to specify a crashkernel: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar sudo update-grub sudo apt install kexec-tools kdump-tools crash makedumpfile sudo systemctl enable kdump-tools sudo systemctl start kdump-tools sudo reboot echo c | sudo tee /proc/sysrq-trigger Results on Arm64 ---------------- After the machine recovers, crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k /var/crash/<dump-dir>/<dump-file> crash 8.0.4 Copyright (C) 2002-2022 Red Hat, Inc. ... For help, type "help". Type "apropos word" to search for commands related to "word"... please wait... (gathering task table data) crash: page excluded: kernel virtual address: ffff07ffa042d8e0 type: "xa_node.slots[off]" Results on amd64 ---------------- On an amd64 machine, using a kernel such as linux- image-6.14.0-29-generic results in crash failing to open. No error is printed but we don't obtain the prompt: crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic /var/crash/202509112049/dump.202509112049 crash 8.0.4 ... For help, type "help". Type "apropos word" to search for commands related to "word"... # Program exits and no prompt is presented - At the time of writing, we have identified that on the Makedumpfile at least two commits are needed: [1] https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1 [2] https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda These are patches to compensate for a change in the kernel's mapping of memory. Using the patched Makedumpfile helps, but it is not sufficient. Including the patches in Makedumpfile (or using the tip of upstream master), but opening with the currently distributed crash results in other non-fatal errors: eg. Patched Makedumpfile with crash 8.0.4 on Arm64: crash 8.0.4 ... WARNING: cannot determine starting stack frame for task ffffd574e21b4800 WARNING: cannot determine starting stack frame for task ffff07ff83296300 WARNING: cannot determine starting stack frame for task ffff07ff83293f80 WARNING: cannot determine starting stack frame for task ffff07ff83a04700 WARNING: cannot determine starting stack frame for task ffff08010507c400 - KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k - DUMPFILE: /var/crash/patched_mdf/dump.202509191531 [PARTIAL DUMP] - CPUS: 128 [OFFLINE: 127] - DATE: Thu Jan 1 00:00:00 UTC 1970 - UPTIME: 00:13:38 + KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k + DUMPFILE: /var/crash/patched_mdf/dump.202509191531 [PARTIAL DUMP] + CPUS: 128 [OFFLINE: 127] + DATE: Thu Jan 1 00:00:00 UTC 1970 + UPTIME: 00:13:38 LOAD AVERAGE: 0.12, 0.16, 0.10 - TASKS: 1573 - NODENAME: penguru - RELEASE: 6.14.0-1008-nvidia-64k - VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025 - MACHINE: aarch64 (unknown Mhz) - MEMORY: 63.8 GB - PANIC: "Kernel panic - not syncing: sysrq triggered crash" - PID: 7886 - COMMAND: "tee" - TASK: ffff08010507c400 [THREAD_INFO: ffff08010507c400] - CPU: 85 - STATE: TASK_RUNNING (PANIC) + TASKS: 1573 + NODENAME: penguru + RELEASE: 6.14.0-1008-nvidia-64k + VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025 + MACHINE: aarch64 (unknown Mhz) + MEMORY: 63.8 GB + PANIC: "Kernel panic - not syncing: sysrq triggered crash" + PID: 7886 + COMMAND: "tee" + TASK: ffff08010507c400 [THREAD_INFO: ffff08010507c400] + CPU: 85 + STATE: TASK_RUNNING (PANIC) On Amd64, crash still fails to open. Therefore, in addition to the above Makedumpfile commits, crash requires some patching. With the above two commits to Makedumpfile I did a bisect on crash on amd64 and arm64. On the amd64 crash side, I have identified that [3] applied in isolation (cherry-picked) is sufficient on amd64 [3] https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad I have also found that [4] applied in isolation (cherry-picked) resolves the issue on arm64 hardware in testflinger (using the machine agent penguru) [4] https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7 However, this was insufficient to open a customer supplied crash dump. To open the customer's dumpfile, the bisect pointed to https://github.com/crash- utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the first commit where everything works as expected. Unfortunately, this does not cleanly apply and some work needs to be done to determine what additional patches and / or custom modifications are needed. [Test Plan] * Ensure that with the proposed combination of Makedumpfile and crash is capable of generating and subsequently opening crashdumps on the latest HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW: 6.14 and 6.18, respectively). If bugs are found in generating and reading crashdumps on the HWE kernel on other architectures (s390x, etc.), this test plan can be expanded to include those. [Where Problems Could Occur] * Crash and Makedumpfile are designed to be backwards-compatible, so the risk of regression is low - however, not zero. This is why it will be important to ensure that the proposed combination of Makedumpfile and crash does not break existing environments - for example the GA kernel * The matrix of hardware and kernel versions (including derivative / cloud kernels) to test again is extensive. It's possible that the commits identified to solve the known problems will not be comprehensive. For example, a different cpu architecture with a different kernel may require additional commits to be backported. [Other Info] * Support/SEG are currently having conversations with the kernel team about the potential to proactively SRU / MRE the latest upstream crash version, and potentially Makedumpfile as well, alongside -hwe kernel releases to avoid this sort of regression in the future. Though, we understand this would require an SRUExceptionPolicy to be approved and published. - - Original Description: ===================== 24.04 LTS, Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Problem Description: crash utility is crashing (error code 1) when attempting to analyze kernel crash dumps. Setup kdump & generated kernel panic using “echo 1 > /proc/sys/kernel/sysrq” but, crash cannot access it: # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic dump.202509161821 crash 8.0.4 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... # echo $? 1 running as root user and file is readable fine: $ :/var/crash/202509161821# ls -l total 299144 -rw------- 1 root whoopsie 119627 Sep 16 18:21 dmesg.202509161821 -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821 symbol file is there: # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic* -rw-r--r-- 1 root root 450705920 Aug 14 18:02 /usr/lib/debug/boot/vmlinux-6.14.0-29-generic tail of strace: 14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661360 write(1, "\n", 1 ) = 1 <0.000119> 14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010> 14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 <0.000011> 14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such file or directory) <0.002921> 14:06:20.664817 exit_group(1) = ? 14:06:20.690105 +++ exited with 1 +++ full crash strace https://filebin.net/custom-bin/crash.strace.1 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: crash 8.0.4-1ubuntu2 ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8 Uname: Linux 6.14.0-29-generic x86_64 ApportVersion: 2.28.1-0ubuntu3.8 Architecture: amd64 CasperMD5CheckResult: pass Date: Thu Sep 18 20:21:26 2025 InstallationDate: Installed on 2025-09-04 (14 days ago) InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 (20250215) ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color SourcePackage: crash UpgradeStatus: No upgrade log present (probably fresh install) ** Summary changed: - Crash utility exits with error code 1 when analyzing kernel crash + [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel Crashdump Files Generated on the Latest HWE Kernel -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to crash in Ubuntu. https://bugs.launchpad.net/bugs/2125145 Title: [WIP] [SRU] Makedumpfile: Errors and Page Exclusions When Opening Kernel Crashdump Files Generated on the Latest HWE Kernel Status in crash package in Ubuntu: Confirmed Status in makedumpfile package in Ubuntu: Confirmed Bug description: Note: SRU is a work in progress as we need to figure out the smallest required subset of commits on the crash-side of things to resolve the issue in environments where this is known to occur Note: Original description is at the bottom of this report [Impact] The current versions of Makedumpfile and Crash in the -updates pocket on Noble do not support the latest hardware enablement kernel for that platform, which is 6.14. There are several architecture-dependent and kernel flavor-dependent behaviours that I will outline below, but the steps to reproduce are the same. Reproducer steps: ----------------- Boot into a hardware enablement kernel. For example, on arm64 use the 6.14.0-1008-nvidia-64k kernel: KERNEL_VERSION=6.14.0-1008-nvidia-64k DISTRO=noble sudo apt update sudo apt install ubuntu-dbgsym-keyring echo "deb http://ddebs.ubuntu.com ${DISTRO} main restricted universe multiverse deb http://ddebs.ubuntu.com ${DISTRO}-updates main restricted universe multiverse | \ sudo tee /etc/apt/sources.list.d/ddebs.list sudo apt update sudo apt install linux-image-${KERNEL_VERSION} sudo apt install linux-image-unsigned-${KERNEL_VERSION}-dbgsym Modify grub's cmdline to specify a crashkernel: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash crashkernel=512M" # Or similar sudo update-grub sudo apt install kexec-tools kdump-tools crash makedumpfile sudo systemctl enable kdump-tools sudo systemctl start kdump-tools sudo reboot echo c | sudo tee /proc/sysrq-trigger Results on Arm64 ---------------- After the machine recovers, crash /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k /var/crash/<dump-dir>/<dump-file> crash 8.0.4 Copyright (C) 2002-2022 Red Hat, Inc. ... For help, type "help". Type "apropos word" to search for commands related to "word"... please wait... (gathering task table data) crash: page excluded: kernel virtual address: ffff07ffa042d8e0 type: "xa_node.slots[off]" Results on amd64 ---------------- On an amd64 machine, using a kernel such as linux- image-6.14.0-29-generic results in crash failing to open. No error is printed but we don't obtain the prompt: crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic /var/crash/202509112049/dump.202509112049 crash 8.0.4 ... For help, type "help". Type "apropos word" to search for commands related to "word"... # Program exits and no prompt is presented At the time of writing, we have identified that on the Makedumpfile at least two commits are needed: [1] https://github.com/makedumpfile/makedumpfile/commit/985e575253f1c2de8d6876cfe685c68a24ee06e1 [2] https://github.com/makedumpfile/makedumpfile/commit/bad2a7c4fa75d37a41578441468584963028bdda These are patches to compensate for a change in the kernel's mapping of memory. Using the patched Makedumpfile helps, but it is not sufficient. Including the patches in Makedumpfile (or using the tip of upstream master), but opening with the currently distributed crash results in other non-fatal errors: eg. Patched Makedumpfile with crash 8.0.4 on Arm64: crash 8.0.4 ... WARNING: cannot determine starting stack frame for task ffffd574e21b4800 WARNING: cannot determine starting stack frame for task ffff07ff83296300 WARNING: cannot determine starting stack frame for task ffff07ff83293f80 WARNING: cannot determine starting stack frame for task ffff07ff83a04700 WARNING: cannot determine starting stack frame for task ffff08010507c400 KERNEL: /usr/lib/debug/boot/vmlinux-6.14.0-1008-nvidia-64k DUMPFILE: /var/crash/patched_mdf/dump.202509191531 [PARTIAL DUMP] CPUS: 128 [OFFLINE: 127] DATE: Thu Jan 1 00:00:00 UTC 1970 UPTIME: 00:13:38 LOAD AVERAGE: 0.12, 0.16, 0.10 TASKS: 1573 NODENAME: penguru RELEASE: 6.14.0-1008-nvidia-64k VERSION: #8-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 26 02:43:53 UTC 2025 MACHINE: aarch64 (unknown Mhz) MEMORY: 63.8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 7886 COMMAND: "tee" TASK: ffff08010507c400 [THREAD_INFO: ffff08010507c400] CPU: 85 STATE: TASK_RUNNING (PANIC) On Amd64, crash still fails to open. Therefore, in addition to the above Makedumpfile commits, crash requires some patching. With the above two commits to Makedumpfile I did a bisect on crash on amd64 and arm64. On the amd64 crash side, I have identified that [3] applied in isolation (cherry-picked) is sufficient on amd64 [3] https://github.com/crash-utility/crash/commit/6752571d8d782d07537a258a1ec8919ebd1308ad I have also found that [4] applied in isolation (cherry-picked) resolves the issue on arm64 hardware in testflinger (using the machine agent penguru) [4] https://github.com/crash-utility/crash/commit/3879e9104826d5ae14a0824ec47ab60056a249a7 However, this was insufficient to open a customer supplied crash dump. To open the customer's dumpfile, the bisect pointed to https://github.com/crash- utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3 as the first commit where everything works as expected. Unfortunately, this does not cleanly apply and some work needs to be done to determine what additional patches and / or custom modifications are needed. [Test Plan] * Ensure that with the proposed combination of Makedumpfile and crash is capable of generating and subsequently opening crashdumps on the latest HWE kernels as well as the GA kernels on arm64 and amd64 (ATOW: 6.14 and 6.18, respectively). If bugs are found in generating and reading crashdumps on the HWE kernel on other architectures (s390x, etc.), this test plan can be expanded to include those. [Where Problems Could Occur] * Crash and Makedumpfile are designed to be backwards-compatible, so the risk of regression is low - however, not zero. This is why it will be important to ensure that the proposed combination of Makedumpfile and crash does not break existing environments - for example the GA kernel * The matrix of hardware and kernel versions (including derivative / cloud kernels) to test again is extensive. It's possible that the commits identified to solve the known problems will not be comprehensive. For example, a different cpu architecture with a different kernel may require additional commits to be backported. [Other Info] * Support/SEG are currently having conversations with the kernel team about the potential to proactively SRU / MRE the latest upstream crash version, and potentially Makedumpfile as well, alongside -hwe kernel releases to avoid this sort of regression in the future. Though, we understand this would require an SRUExceptionPolicy to be approved and published. Original Description: ===================== 24.04 LTS, Linux 6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Aug 14 16:52:50 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Problem Description: crash utility is crashing (error code 1) when attempting to analyze kernel crash dumps. Setup kdump & generated kernel panic using “echo 1 > /proc/sys/kernel/sysrq” but, crash cannot access it: # crash /usr/lib/debug/boot/vmlinux-6.14.0-29-generic dump.202509161821 crash 8.0.4 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... # echo $? 1 running as root user and file is readable fine: $ :/var/crash/202509161821# ls -l total 299144 -rw------- 1 root whoopsie 119627 Sep 16 18:21 dmesg.202509161821 -rw-r--r-- 1 root whoopsie 306200163 Sep 16 18:21 dump.202509161821 symbol file is there: # ls -l /usr/lib/debug/boot/vmlinux-6.14.0-29-generic* -rw-r--r-- 1 root root 450705920 Aug 14 18:02 /usr/lib/debug/boot/vmlinux-6.14.0-29-generic tail of strace: 14:06:20.661240 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661281 rt_sigaction(SIGINT, {sa_handler=0x5ec383cbceb0, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661322 rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER|SA_NODEFER, sa_restorer=0x7b0841845330}, NULL, 8) = 0 <0.000008> 14:06:20.661360 write(1, "\n", 1 ) = 1 <0.000119> 14:06:20.661579 lseek(3, 10312, SEEK_SET) = 10312 <0.000010> 14:06:20.661617 read(3, "OSRELEASE=6.14.0-29-generic\nBUIL"..., 3276) = 3276 <0.000011> 14:06:20.661748 unlink("/var/tmp/ramdump_elf_XXXXXX") = -1 ENOENT (No such file or directory) <0.002921> 14:06:20.664817 exit_group(1) = ? 14:06:20.690105 +++ exited with 1 +++ full crash strace https://filebin.net/custom-bin/crash.strace.1 ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: crash 8.0.4-1ubuntu2 ProcVersionSignature: Ubuntu 6.14.0-29.29~24.04.1-generic 6.14.8 Uname: Linux 6.14.0-29-generic x86_64 ApportVersion: 2.28.1-0ubuntu3.8 Architecture: amd64 CasperMD5CheckResult: pass Date: Thu Sep 18 20:21:26 2025 InstallationDate: Installed on 2025-09-04 (14 days ago) InstallationMedia: Ubuntu 24.04.2 LTS "Noble Numbat" - Release amd64 (20250215) ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color SourcePackage: crash UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/crash/+bug/2125145/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

