We've got a serial console log from AWS Support through our Support team
(special thanks to Pedro Principeza and our former colleague Mark Thomas.)
The problem is definitely not the ext4/jbd2 patchset as suspected
(although it's unclear how reverting it caused the kernel to boot;
maybe build environment differences?)
Early in the kernel boot, before even trying to mount the rootfs,
there are blcked swapper tasks, and they just continue to happen.
(full log attached.)
```
Starting Reboot...
...
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1113-aws
root=UUID=db937f23-4ed7-4c4b-8058-b23a860fae08 ro console=tty1 console=ttyS0
nvme_core.io_timeout=4294967295
...
[ 0.000000] gran_size: 64K chunk_size: 256M num_reg: 10 lose cover
RAM: 737G
...
[ 2.742455] clocksource: Switched to clocksource tsc
[ 242.656089] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 363.488083] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 484.320066] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 605.152061] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 725.984054] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 846.816051] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 967.648055] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 1088.480033] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 1209.312036] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
<end of log>
```
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1946149
Title:
Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
r5.metal
Status in linux-aws package in Ubuntu:
New
Bug description:
When creating an r5.metal instance on AWS, the default kernel is
bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
If I remove these patches the instance correctly boots the 4.15 kernel
https://lists.ubuntu.com/archives/kernel-
team/2021-September/123963.html
With that being said, after successfully updating to the 4.15 without
those patches applied, I can then upgrade to a 4.15 kernel with the
above patches included, and the instance will boot properly.
This problem only appears on metal instances, which uses NVME instead
of XVDA devices.
AWS instances also use the 'discard' mount option with ext4, thought
maybe there could be a race condition between ext4 discard and journal
flush. Removed 'discard' from mount options and rebooted 5.4 kernel
prior to 4.15 kernel installation, but still wouldn't boot after
installing the 4.15 kernel.
I have been unable to capture a stack trace using 'aws get-console-
output'. After enabling kdump I was unable to replicate the failure.
So there must be some sort of race with either ext4 and/or nvme.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp