This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1918694

Title:
  aws: fix hibernation issues on c5.18xlarge

Status in linux-aws package in Ubuntu:
  New
Status in linux-aws source package in Focal:
  Fix Committed

Bug description:
  [Impact]

  Hibernation is still unreliable on c5.18xlarge instances, usually the
  system hibernates correctly, but on resume it either perfoms a regular
  reboot instead of resuming from hibernation, or the system is
  completely stuck after the hibernated kernel is loaded in memory (more
  exactly the system is stuck when the resume callbacks of the
  hibernated kernel are executed).

  [Test plan]

  Create a c5.18xlarge instance, run the memory stress test script (the
  same test script that we are using to stress test hibernation),
  trigger the hibernate event, trigger the resume event. Repeat a couple
  of times and the problem is very likely to happen.

  [Fix]

  Amazon pointed out two fixes that should address both issues:
  1) upstream patch "PM: hibernate: flush swap writer after marking": this 
prevents the regular reboot issue, because it ensures that the I/O is always 
flushed after, not before, writing the hibernation signature

  2) we need to reserve more space for HVC_BOOT_ARRAY_SIZE: this is a
  temporary solution (SAUCE PATCH for now), suggested by Amazon, they
  are working on a proper (more elegant) fix, but doubling the size of
  HVC_BOOT_ARRAY_SIZE seems to resolve the problem, we have tested this
  change extensively in the AWS cloud and it seems to prevent the
  "system stuck on resume" issue from happening

  [Regression potential]

  The first patch is touching only the hibernation code, so potential
  regressions could be experienced only in the hibernation scenario. The
  second patch is more like a hack at the moment and it's affecting
  kvmclock. Increasing the size of HVC_BOOT_ARRAY_SIZE could potentially
  introduce regressions on small sized kvm systems and a better solution
  would be to allocate the array hv_clock_boot dynamically. And this is
  actually the proper fix that Amazon is currently working on. When the
  fix will be published upstream we should apply that one and drop this
  SAUCE PATCH.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1918694/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to