** Description changed:

  [Impact]
  
  Hibernation currently fails for all AWS Xen instance types
  (c3/c4/i3/m3/m4/r3/r4/t2) with all Jammy 5.15 and Kinetic 5.19 linux-aws
  kernels.
  
  When attempting to hibernate, the system gets stuck in
  sync_inodes_one_sb() when processing the rootfs, fails to hibernate, and
  shuts down. When you start the instance, it starts fresh, and does not
  resume from the incomplete hibernation image. Networking is also broken,
  and you cannot ssh in.
  
  Upon review of the jammy/linux-aws git log, it appears that the kernel
  is missing AWS hibernation enablement patches entirely. These need to be
  included to get hibernation working.
  
  [Fix]
  
  Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
  https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline
  
  After careful review of the amazon-5.15.y/mainline branch, we have found
  the below set of patches authored by Amazon AWS Hibernation team to be
  minimally sufficient to get hibernation working on both Jammy 5.15 and
  Kinetic 5.19.
  
- x86: Disable KASLR when Xen is detected
  xen: Restore xen-pirqs on resume from hibernation
  xen-netfront: call netif_device_attach on resume
  xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
  xen: restore pirqs on resume from hibernation.
  block: xen-blkfront: consider new dom0 features on restore
  x86: tsc: avoid system instability in hibernation
  xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
  Revert "xen: dont fiddle with event channel masking in suspend/resume"
  PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
  x86/xen: close event channels for PIRQs in system core suspend callback
  xen/events: add xen_shutdown_pirqs helper function
  x86/xen: save and restore steal clock
  xen/time: introduce xen_{save,restore}_steal_clock
  xen-netfront: add callbacks for PM suspend and hibernation support
  xen-blkfront: add callbacks for PM suspend and hibernation
  x86/xen: add system core suspend and resume callbacks
  x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
  xenbus: add freeze/thaw/restore callbacks support
  xen/manage: introduce helper function to know the on-going suspend mode
  xen/manage: keep track of the on-going suspend mode
  
  These patches will be carried as SAUCE patches, and their subjects
  marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon
  Hibernation team, with the repo being the Amazon Linux 2 kernel repo.
  
  [Testcase]
  
  1. Log into Amazon EC2.
  2. Select Launch Instance.
  3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest 
t2.medium.
  4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
  5. Select your SSH keypair.
  6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
  7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
  8. Create the Instance. SSH in.
  9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and 
configure grub.
  10. Start a screen session. Echo some text and then detach with ctrl-d.
  11. Log out from instance.
  12. In EC2, select "Instance State" > "Hibernate".
  13. Wait 30 seconds to one minute. The state will go from "Stopping" to 
"Stopped".
  14. Start the instance again.
  15. SSH in.
  16. Attempt to resume screen session with "screen -r".
  
  If you are not able to ssh into the instance, hibernation had failed. If
  ssh works and the screen session is still running, hibernation was
  successful.
  
  Alternatively, the CPC team can run their Hibernation testsuite over
  Jammy and Kinetic.
  
  We have built test kernels for Jammy and Kinetic with the patches, and
  they are available in the below ppa:
  
+ https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-test
+ 
  If you try and hibernate and resume with the test kernels, hibernation
  is successful.
  
  [Where problems could occur]
  
  We are adding a significant amount of code to the Xen subsystem, spread
  across many commits. This code has not been mainlined, and is instead
  maintained out of tree by the Amazon AWS Hibernation team.
  
  The changes target hibernation, block devices, and clock devices,
  specific to those used on AWS Xen instances. Most of these patches have
  been applied to Xenial, Bionic, Focal and other series for a long time,
  but some patches are new for 5.15 onward.
  
  The changes will only target linux-aws to try and limit regression risk
  to AWS users, and any regressions will be limited to users of Xen based
  instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2 and Xen
  4.11.
  
  If a regression were to occur, the instance would likely fail to
  hibernate, and at worst, write an incomplete hibernation image to the
  swapfile. The kernel will see this on start, and instead of resuming
  from the hibernation image, will start fresh. It is unlikely to cause
  any filesystem corruption on the rootfs, but any in progress
  computations at the time of hibernation could be lost. The current
  broken behaviour breaks networking, and users would have to power cycle
  the instance a few times before they can ssh in again.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1968062

Title:
  Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types

Status in linux-aws package in Ubuntu:
  In Progress
Status in linux-aws source package in Jammy:
  In Progress
Status in linux-aws source package in Kinetic:
  In Progress

Bug description:
  [Impact]

  Hibernation currently fails for all AWS Xen instance types
  (c3/c4/i3/m3/m4/r3/r4/t2) with all Jammy 5.15 and Kinetic 5.19 linux-
  aws kernels.

  When attempting to hibernate, the system gets stuck in
  sync_inodes_one_sb() when processing the rootfs, fails to hibernate,
  and shuts down. When you start the instance, it starts fresh, and does
  not resume from the incomplete hibernation image. Networking is also
  broken, and you cannot ssh in.

  Upon review of the jammy/linux-aws git log, it appears that the kernel
  is missing AWS hibernation enablement patches entirely. These need to
  be included to get hibernation working.

  [Fix]

  Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
  https://github.com/amazonlinux/linux/tree/amazon-5.15.y/mainline

  After careful review of the amazon-5.15.y/mainline branch, we have
  found the below set of patches authored by Amazon AWS Hibernation team
  to be minimally sufficient to get hibernation working on both Jammy
  5.15 and Kinetic 5.19.

  xen: Restore xen-pirqs on resume from hibernation
  xen-netfront: call netif_device_attach on resume
  xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
  xen: restore pirqs on resume from hibernation.
  block: xen-blkfront: consider new dom0 features on restore
  x86: tsc: avoid system instability in hibernation
  xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
  Revert "xen: dont fiddle with event channel masking in suspend/resume"
  PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA
  x86/xen: close event channels for PIRQs in system core suspend callback
  xen/events: add xen_shutdown_pirqs helper function
  x86/xen: save and restore steal clock
  xen/time: introduce xen_{save,restore}_steal_clock
  xen-netfront: add callbacks for PM suspend and hibernation support
  xen-blkfront: add callbacks for PM suspend and hibernation
  x86/xen: add system core suspend and resume callbacks
  x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume
  xenbus: add freeze/thaw/restore callbacks support
  xen/manage: introduce helper function to know the on-going suspend mode
  xen/manage: keep track of the on-going suspend mode

  These patches will be carried as SAUCE patches, and their subjects
  marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon
  Hibernation team, with the repo being the Amazon Linux 2 kernel repo.

  [Testcase]

  1. Log into Amazon EC2.
  2. Select Launch Instance.
  3. Under Instance Type, select any from (c3/c4/i3/m3/m4/r3/r4/t2). I suggest 
t2.medium.
  4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
  5. Select your SSH keypair.
  6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
  7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
  8. Create the Instance. SSH in.
  9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and 
configure grub.
  10. Start a screen session. Echo some text and then detach with ctrl-d.
  11. Log out from instance.
  12. In EC2, select "Instance State" > "Hibernate".
  13. Wait 30 seconds to one minute. The state will go from "Stopping" to 
"Stopped".
  14. Start the instance again.
  15. SSH in.
  16. Attempt to resume screen session with "screen -r".

  If you are not able to ssh into the instance, hibernation had failed.
  If ssh works and the screen session is still running, hibernation was
  successful.

  Alternatively, the CPC team can run their Hibernation testsuite over
  Jammy and Kinetic.

  We have built test kernels for Jammy and Kinetic with the patches, and
  they are available in the below ppa:

  https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/aws-hibernate-
  test

  If you try and hibernate and resume with the test kernels, hibernation
  is successful.

  [Where problems could occur]

  We are adding a significant amount of code to the Xen subsystem,
  spread across many commits. This code has not been mainlined, and is
  instead maintained out of tree by the Amazon AWS Hibernation team.

  The changes target hibernation, block devices, and clock devices,
  specific to those used on AWS Xen instances. Most of these patches
  have been applied to Xenial, Bionic, Focal and other series for a long
  time, but some patches are new for 5.15 onward.

  The changes will only target linux-aws to try and limit regression
  risk to AWS users, and any regressions will be limited to users of Xen
  based instance types (c3/c4/i3/m3/m4/r3/r4/t2), covering both Xen 4.2
  and Xen 4.11.

  If a regression were to occur, the instance would likely fail to
  hibernate, and at worst, write an incomplete hibernation image to the
  swapfile. The kernel will see this on start, and instead of resuming
  from the hibernation image, will start fresh. It is unlikely to cause
  any filesystem corruption on the rootfs, but any in progress
  computations at the time of hibernation could be lost. The current
  broken behaviour breaks networking, and users would have to power
  cycle the instance a few times before they can ssh in again.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1968062/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to