After debugging the problem, a potential workaround was found which alleviates but doesn't fix the issue; the workaround is to use the "retain_initrd" on kexec boots to prevent kernel from freeing the initrd memory area. Also, it was observed that bigger initrds tend to show the problem more consistently.
After using pstore/ramoops to collect logs (and ftrace) on failure and observe the same issue in multiple kernel versions (including mainline) and other distros, it was clear the reason was a memory corruption. Since kexec is fast path on reboot, not going through the full BIOS reset, it was conjectured that an adapter not properly shutdown on kexec path could have its firmware throwing an invalid memory access in form of DMA write to a previous valid address, effectively corrupting an arbitrary region. Then, it was noticed Amazon ena driver does not have a shutdown handler, which is used on reboot/kexec to quiesce properly the devices (through the call chain device_shutdown() -> pci_device_shutdown() -> driver .shutdown() handler, if any). In case the device has no shutdown handler, PCI layer will clear its master bit on PCI command register, disabling the adapter. But this operation doesn't quiesce the device's firmware, and in the next boot, when it gets activated (aka, its master bit gets set), it may perform a buffered memory operation. Tests on mainline kernel performing rmmod of ena driver before kexec showed that the initrd corruption didn't happen anymore, due to rmmod calling ena_remove(), which properly turned the adapter down before the kexecs. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1869948 Title: Multiple Kexec in AWS Nitro instances fail Status in linux package in Ubuntu: Confirmed Status in linux source package in Xenial: Confirmed Status in linux source package in Bionic: Confirmed Status in linux source package in Eoan: Confirmed Status in linux source package in Focal: Confirmed Bug description: [Impact] * Currently, users cannot perform multiple kernel kexec loads on AWS Nitro instances (KVM-based); after the 2nd or 3rd kexec, an initrd corruption is observed, with the following signature: Initramfs unpacking failed: junk within compressed archive [...] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc7-gpiccoli+ #26 Hardware name: Amazon EC2 t3.large/, BIOS 1.0 10/16/2017 Call Trace: dump_stack+0x6d/0x9a ? csum_partial_copy_generic+0x150/0x170 panic+0x101/0x2e3 ? do_execve+0x25/0x30 ? rest_init+0xb0/0xb0 kernel_init+0xfb/0x100 ret_from_fork+0x35/0x40 * After investigation (see comment 2), it was noticed the Amazon ena network driver doesn't provide a shutdown() handler, hence it could be performing a DMA transaction to a previous valid address during boot, which would then corrupt kernel memory. The following patch was proposed and fixed the issue, allowing 1000 kexecs to be executed successfully with no issues observed: 428c491332bc("net: ena: Add PCI shutdown handler to allow safe kexec") [ git.kernel.org/linus/428c491332bc ]. * Hence, we are hereby requesting SRU for this patch. It was tested in all supported series (4.4, 4.15 and 5.3) in Amazon Nitro instances with success, and reviewed/acked by ena driver team and a kexec developer from other distro. Worth mentioning that we proposed an upstream multi-vendor discussion about this issue: marc.info/?l=kexec&m=158299605013194 [Test case] * The basic test procedure is about performing multiple kexecs sequentially; AWS does not provide a full console, so in case of failures one could check the instance screenshot or use pstore/ramoops in order to collect dmesg after a crash in a preserved memory area. The commands used to perform kexec are: kexec -l <kernel file> --initrd <initrd file> --reuse-cmdline systemctl kexec Alternatively, one could user "--append=" instead of "--reuse-cmdline" if a change in kexec command-line is desired; also, to execute the kexec-loaded kernel both "kexec -e" and "systemctl kexec" are equally valid. * On comment 3 we proposed a script/approach to auto-test kexecs, used here to perform 1000 kexecs with the proposed patch. [Regression Potential] * Although the patch proposed here introduce a PCI handler, it kept the remove handler identical and based shutdown strongly on ena_remove(), changing just netdev handling following other upstream drivers. It was extensively tested and presented no issue. Also, it's self-contained and affect only one driver, so any other cloud providers or non-cloud environment wouldn't be even affected by the patch. * In case of a potential regression, it could manifest as a delay or issue on reboot/shutdown path, only if ena driver is in use. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1869948/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp