> We see an address of 0xfc7ffb000 Hi Matt,
I don't think you're accounting for the additional pages due to the Xen balloon, are you? That increases physical memory, after boot. If you check the /proc/zoneinfo file, look at the Normal zone's spanned pages and start pfn, e.g.: Node 0, zone Normal pages free 15116671 min 7661 low 22873 high 38085 node_scanned 0 spanned 15499264 present 15499264 managed 15212161 ... start_pfn: 1048576 and so, $ printf "%x\n" $[ 1048576 + 15499264 ] fc8000 meaning that address you see is part of the pages in the balloon memory region... I disabled Ubuntu's memory hotadd (commented it out in /lib/udev/rules.d/40-vm-hotadd.rules), and rebooted, and the Normal zone's present pages was reduced so that the end is fc0000, matching the boot time max pfn; I then tried to reproduce the problem and it seems gone! So I think that must be the issue; the hypervisor's NVMe driver isn't expecting any pages from the Xen ballooned region. I checked on Amazon Linux, and saw why it isn't affected: $ grep XEN_BALLOON /boot/config-4.4.41-36.55.amzn1.x86_64 # CONFIG_XEN_BALLOON is not set I suspect that skips quite a lot of problems for Amazon Linux, as the Xen ballooning is quite annoying (see bug 1518457 comment 126, for example). Maybe Ubuntu should disable Xen ballooning for AWS also? If not, then this seems to be a hypervisor bug, it needs to allow pages from the ballooned region also. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1668129 Title: Amazon I3 Instance Buffer I/O error on dev nvme0n1 Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Triaged Bug description: On the AWS i3 instance class - when putting the new NVME storage disks under high IO load - seeing data corruption and errors in dmesg [ 662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912 [ 662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost async page write [ 662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost async page write [ 662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost async page write [ 662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost async page write [ 662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost async page write [ 662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost async page write [ 662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost async page write [ 662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost async page write [ 662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost async page write [ 662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost async page write [ 663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912 <snip> [ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744 [ 1012.755396] buffer_io_error: 194552 callbacks suppressed [ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async page write [ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async page write [ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async page write [ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async page write [ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async page write Able to replicate this with a bonnie++ stress test. bonnie++ -d /mnt/test/ -r 1000 Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux --- AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Feb 27 02:12 seq crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer AplayDevices: Error: [Errno 2] No such file or directory ApportVersion: 2.20.1-0ubuntu2.5 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A DistroRelease: Ubuntu 16.04 Ec2AMI: ami-bc62b2aa Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1d Ec2InstanceType: i3.2xlarge Ec2Kernel: unavailable Ec2Ramdisk: unavailable IwConfig: Error: [Errno 2] No such file or directory JournalErrors: Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] failed with exit code 1: Hint: You are currently not seeing messages from other users and the system. Users in the 'systemd-journal' group can see all messages. Pass -q to turn off this notice. No journal files were opened due to insufficient permissions. Lsusb: Error: command ['lsusb'] failed with exit code 1: MachineType: Xen HVM domU Package: linux (not installed) PciMultimedia: ProcEnviron: TERM=screen-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0 ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44 RelatedPackageVersions: linux-restricted-modules-4.4.0-64-generic N/A linux-backports-modules-4.4.0-64-generic N/A linux-firmware N/A RfKill: Error: [Errno 2] No such file or directory Tags: xenial ec2-images Uname: Linux 4.4.0-64-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: WifiSyslog: _MarkForUpload: True dmi.bios.date: 12/12/2016 dmi.bios.vendor: Xen dmi.bios.version: 4.2.amazon dmi.chassis.type: 1 dmi.chassis.vendor: Xen dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr: dmi.product.name: HVM domU dmi.product.version: 4.2.amazon dmi.sys.vendor: Xen To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp