> We see an address of 0xfc7ffb000

Hi Matt,

I don't think you're accounting for the additional pages due to the Xen
balloon, are you?  That increases physical memory, after boot.  If you
check the /proc/zoneinfo file, look at the Normal zone's spanned pages
and start pfn, e.g.:

Node 0, zone   Normal
  pages free     15116671
        min      7661
        low      22873
        high     38085
   node_scanned  0
        spanned  15499264
        present  15499264
        managed  15212161
...
  start_pfn:           1048576


and so,
$ printf "%x\n" $[ 1048576 + 15499264 ]
fc8000

meaning that address you see is part of the pages in the balloon memory
region...

I disabled Ubuntu's memory hotadd (commented it out in
/lib/udev/rules.d/40-vm-hotadd.rules), and rebooted, and the Normal
zone's present pages was reduced so that the end is fc0000, matching the
boot time max pfn; I then tried to reproduce the problem and it seems
gone!

So I think that must be the issue; the hypervisor's NVMe driver isn't
expecting any pages from the Xen ballooned region.  I checked on Amazon
Linux, and saw why it isn't affected:

$ grep XEN_BALLOON /boot/config-4.4.41-36.55.amzn1.x86_64 
# CONFIG_XEN_BALLOON is not set

I suspect that skips quite a lot of problems for Amazon Linux, as the
Xen ballooning is quite annoying (see bug 1518457 comment 126, for
example).

Maybe Ubuntu should disable Xen ballooning for AWS also?  If not, then
this seems to be a hypervisor bug, it needs to allow pages from the
ballooned region also.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1668129

Title:
  Amazon I3 Instance Buffer I/O error on dev nvme0n1

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  On the AWS i3 instance class - when putting the new NVME storage disks
  under high IO load - seeing data corruption and errors in dmesg

  
  [  662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
  [  662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost 
async page write
  [  662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost 
async page write
  [  662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost 
async page write
  [  662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost 
async page write
  [  662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost 
async page write
  [  662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost 
async page write
  [  662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost 
async page write
  [  662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost 
async page write
  [  662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost 
async page write
  [  662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost 
async page write
  [  663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
  <snip>
  [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
  [ 1012.755396] buffer_io_error: 194552 callbacks suppressed
  [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async 
page write
  [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async 
page write
  [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async 
page write
  [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async 
page write
  [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async 
page write

  Able to replicate this with a bonnie++ stress test.

  bonnie++ -d /mnt/test/ -r 1000

  Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Feb 27 02:12 seq
   crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.20.1-0ubuntu2.5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  CRDA: N/A
  DistroRelease: Ubuntu 16.04
  Ec2AMI: ami-bc62b2aa
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: i3.2xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory
  JournalErrors:
   Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] 
failed with exit code 1: Hint: You are currently not seeing messages from other 
users and the system.
         Users in the 'systemd-journal' group can see all messages. Pass -q to
         turn off this notice.
   No journal files were opened due to insufficient permissions.
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: Xen HVM domU
  Package: linux (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic 
root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
  ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-64-generic N/A
   linux-backports-modules-4.4.0-64-generic  N/A
   linux-firmware                            N/A
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  xenial ec2-images
  Uname: Linux 4.4.0-64-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
   
  WifiSyslog:
   
  _MarkForUpload: True
  dmi.bios.date: 12/12/2016
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: 
dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to