@rbalint, we have not seen the same issue with the 4.15 linux-aws-hwe
kernels used in xenial. At this time, the backport to xenial is not
necessary.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1864041

Title:
  xen_netfront devices unresponsive after hibernation/resume

Status in ec2-hibinit-agent package in Ubuntu:
  Fix Released
Status in linux-aws package in Ubuntu:
  New
Status in ec2-hibinit-agent source package in Xenial:
  Incomplete
Status in linux-aws source package in Xenial:
  New
Status in ec2-hibinit-agent source package in Bionic:
  Fix Released
Status in linux-aws source package in Bionic:
  New
Status in ec2-hibinit-agent source package in Eoan:
  Fix Released
Status in linux-aws source package in Eoan:
  Won't Fix
Status in ec2-hibinit-agent source package in Focal:
  Fix Released
Status in linux-aws source package in Focal:
  New

Bug description:
  [Impact]

  The xen_netfront device is sometimes unresponsive after a hibernate
  and resume event. This is limited to the c4, c5, m4, m5, r4, r5
  instance families, all of which are xen based, and support
  hibernation.

  When the issue occurrs, the instance is inaccessible without a full
  restart. Debugging by running a process which outputs regularly to the
  serial console shows that the instance is still running.

  [Test Case]

  1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel 
with on-demand hibernation support enabled.
  2) Start a long-running process which generates messages to the serial console
  3) Begin observing these messages on the console (using the AWS UI or CLI  to 
grab a screenshot).
  4) Suspend and resume the instance, continuing to refresh the console 
screenshot.
  5) The screenshot should continue to show updates even if ssh access is no 
longer working.

  [Regression Potential]

  The workaround in ec2-hibinit-agent is reloading the xen_netfront kernel 
module before restarting systemd-networkd. If the kernel module is removed (for 
example when hitting LP: #1615381) the module reloading fails and
  the instance can not restore network connections. This is expected to a be 
very rare situation and the module reload is the best workaround the Kernel 
Team found to mitigate the original issue.

  The workaround also adds a 2 second delay before reloading the modules
  to let things settle a bit after resuming. The 2 seconds is very short
  compared to the overall time needed resuming an instance.

  [Original Bug Text]

  The xen_netfront device is sometimes unresponsive after a hibernate
  and resume event. This is limited to the c4, c5, m4, m5, r4, r5
  instance families, all of which are xen based, and support
  hibernation.

  When the issue occurrs, the instance is inaccessible without a full
  restart. Debugging by running a process which outputs regularly to the
  serial console shows that the instance is still running.

  A workaround is to build the xen_netfront module separately and
  restart the module and networking during the resume handler. For
  example:

  modprobe -r xen_netfront
  modprobe xen_netfront
  systemctl restart systemd-networkd

  With this workaround in place, the unresponsive issue is no longer
  observed.

  To reproduce this problem:

  1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel 
with on-demand hibernation support enabled.
  2) Start a long-running process which generates messages to the serial console
  3) Begin observing these messages on the console (using the AWS UI or CLI  to 
grab a screenshot).
  4) Suspend and resume the instance, continuing to refresh the console 
screenshot.
  5) The screenshot should continue to show updates even if ssh access is no 
longer working.
  ---
  ProblemType: Bug
  ApportVersion: 2.20.9-0ubuntu7.9
  Architecture: amd64
  DistroRelease: Ubuntu 18.04
  Ec2AMI: ami-0edf3b95e26a682df
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-west-2a
  Ec2InstanceType: m4.large
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  Package: linux-aws 4.15.0.1058.59
  PackageArchitecture: amd64
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
  Tags:  bionic ec2-images
  Uname: Linux 5.0.0-1025-aws x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ec2-hibinit-agent/+bug/1864041/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to