Mauricio,

Interesting update, I agree that we need more info as to what the state
is when the instance won't boot switching to the new 4.15 kernel.  I'll
check with my team in the morning and see if we can get additional info
from AWS

I was trying a few more scenarios this evening the first being the most
interesting.

Scenario 1
start with 5.4.0-1056-aws
install 5.4.0-1058-aws
reboot
confirm 5.4.0-1058-aws booted
reboot AGAIN
install 4.15.0-1113-aws
reboot
machine booted 4.15.0-1113-aws successfully

Scenario 2
start with 5.4.0-1056-aws
install 4.15.0-1112-aws
reboot
install 4.15.0-1113-aws
reboot
confirmed 4.15.0-1113-aws booted
then booted back into 5.4.0-1056-aws
removed 4.15.0-1112-aws and 4.15.0-1113-aws
rebooted again for good measure
confirmed still running 5.4.0-1056-aws
installed 4.15.0-1113-aws
rebooted
4.15.0-1113-aws successfully loaded

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

Status in linux-aws package in Ubuntu:
  New

Bug description:
  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.

  If I remove these patches the instance correctly boots the 4.15 kernel

  https://lists.ubuntu.com/archives/kernel-
  team/2021-September/123963.html

  With that being said, after successfully updating to the 4.15 without
  those patches applied, I can then upgrade to a 4.15 kernel with the
  above patches included, and the instance will boot properly.

  This problem only appears on metal instances, which uses NVME instead
  of XVDA devices.

  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' from mount options and rebooted 5.4 kernel
  prior to 4.15 kernel installation, but still wouldn't boot after
  installing the 4.15 kernel.

  I have been unable to capture a stack trace using 'aws get-console-
  output'. After enabling kdump I was unable to replicate the failure.
  So there must be some sort of race with either ext4 and/or nvme.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to