> This bug is still present on 14.04 using linux-generic-lts-xenial
kernel 4.4.0-87-generic.

that's correct, and there is no planned change for the standard kernel.
Only the linux-aws kernel is being changed to address this issue, by
disabling Xen memory ballooning, as described in comment 50.


A bit more detail on the issue:

1. AWS Xen hypervisor boots linux and provides e820 map, and Xen balloon target.
2. Ubuntu kernel boots and sets up all memory listed in the e820 map.
3. Xen balloon driver notices total memory doesn't quite match its target, and 
so requests some pages from Xen hypervisor.
4. AWS Xen hypervisor allows Ubuntu kernel balloon driver to have exactly 11 
more pages, which are registered with the Ubuntu kernel as hotplugged memory 
(hypervisor rejects requests for any more balloon pages).
5. The new balloon hotplugged pages are enabled (via udev or kernel config or 
sysfs), which makes them available for general use
6. If any NVMe I/O operation uses any of those 11 balloon pages for DMA, the 
hypervisor sees that the page physical address is outside its e820 map address 
range (because it was a hotplugged page) and fails the NVMe I/O.

The problem here lies either in #4 or #6 above, meaning that the
hypervisor either should reject all requests for additional hotplugged
memory pages (step 4) or it should allow DMA using hotplugged memory
pages (step 6).  Any change to the Ubuntu kernel is only working around
this hypervisor problem by not enabling any hotplugged pages.

AWS is well aware of this and is investigating what changes can be made
to their hypervisor, but I am not part of those discussions and so I
can't provide any more detail on if/when AWS might fix either #4 and/or
#6.  I will note that the Amazon Linux kernel has Xen ballooning
disabled, and I believe the RHEL kernel does as well, so they have both
only worked around this issue.

Until the AWS hypervisor is changed, there are various options to work
around the issue:

Trusty:
The trusty 14.04 release does have Xen ballooning enabled, and it does hotplug 
memory, however the udev rules do not enable the hotplugged memory, so this 
issue does not exist in trusty (unless the hotplugged memory is manually 
enabled).

Xenial with 4.4 kernel:
The standard 4.4 kernel in Xenial does have Xen ballooning enabled, because it 
may be desired under non-AWS Xen hypervisors.  The recommended way to work 
around the issue is to edit the 40-vm-hotadd.rules as described in comment 29.

Xenial with HWE kernel, or Zesty:
Starting with the 4.8 kernel, hotplug memory is automatically onlined, so in 
addition to editing the udev rule as described above (in Xenial with 4.4 
kernel), you also must add a kernel boot param as described in comment 44.

Xenial linux-aws:
The linux-aws kernel has Xen ballooning disabled in the kernel configuration, 
so it will not cause any memory to be hotplugged, thus avoiding the problem; no 
other workaround is required when using the linux-aws kernel.


I am marking this as "wont fix" for the standard Xenial kernel.


** Changed in: linux (Ubuntu Xenial)
       Status: Triaged => Won't Fix

** Changed in: linux (Ubuntu)
       Status: Triaged => Won't Fix

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1668129

Title:
  Amazon I3 Instance Buffer I/O error on dev nvme0n1

Status in linux package in Ubuntu:
  Won't Fix
Status in linux-aws package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Won't Fix
Status in linux-aws source package in Xenial:
  Fix Committed

Bug description:
  On the AWS i3 instance class - when putting the new NVME storage disks
  under high IO load - seeing data corruption and errors in dmesg

  [  662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
  [  662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost 
async page write
  [  662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost 
async page write
  [  662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost 
async page write
  [  662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost 
async page write
  [  662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost 
async page write
  [  662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost 
async page write
  [  662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost 
async page write
  [  662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost 
async page write
  [  662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost 
async page write
  [  662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost 
async page write
  [  663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
  <snip>
  [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
  [ 1012.755396] buffer_io_error: 194552 callbacks suppressed
  [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async 
page write
  [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async 
page write
  [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async 
page write
  [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async 
page write
  [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async 
page write

  Able to replicate this with a bonnie++ stress test.

  bonnie++ -d /mnt/test/ -r 1000

  Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  ---
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Feb 27 02:12 seq
   crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  DistroRelease: Ubuntu 16.04
  Ec2AMI: ami-bc62b2aa
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: i3.2xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  JournalErrors:
   Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] 
failed with exit code 1: Hint: You are currently not seeing messages from other 
users and the system.
         Users in the 'systemd-journal' group can see all messages. Pass -q to
         turn off this notice.
   No journal files were opened due to insufficient permissions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:

  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:

  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic 
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-64-generic N/A
   linux-backports-modules-4.4.0-64-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial ec2-images
  Uname: Linux 4.4.0-64-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  WifiSyslog:

  _MarkForUpload: True
  dmi.bios.date: 12/12/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~group.of.nepali.translators
Post to     : group.of.nepali.translators@lists.launchpad.net
Unsubscribe : https://launchpad.net/~group.of.nepali.translators
More help   : https://help.launchpad.net/ListHelp

Reply via email to