Public bug reported:

[Impact]

AWS wish to implement per-device IO timeouts in their cloud, since
currently NVMe devices only support a single global timeout, and this
doesn't play well with EBS volumes, which have error recovery
capabilities built into the back end, and require large timeouts, and
Instance Store / ephemeral volumes, which are to be treated as local
disks, which require short timeouts.

AWS have proposed a solution which is to backport the below two patches
to the Ubuntu kernels:

commit 65cd1d13b880920054d6c750679baa80b7f9c072
Author: Weiping Zhang <[email protected]>
Date:   Thu Nov 29 00:04:39 2018 +0800
subject: block: add io timeout to sysfs

commit 4d25339e32a1b6e1f490bb78b1e5b0fa9eb3e073
Author: Weiping Zhang <[email protected]>
Date:   Tue Apr 2 21:14:30 2019 +0800
subject: block: don't show io_timeout if driver has no timeout handler

This enables a sysfs entry in /sys/block/nvmeXnX/queue/io_timeout which
gets and sets the io_timeout per device in milliseconds.

Kernel commits are being tracked in LP #1841461

EBS volumes will use the default timeout as set on the kernel command
line of 4294966296, and Instance Store volumes will need to use a
default timeout of 30000.

AWS have suggested that we deploy the below udev rule to automatically
set the io_timeout of all Instance Store volumes to 30000:

KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon
EC2 NVMe Instance Storage", ATTR{queue/io_timeout}="30000"

This bug is to add the above udev rule to cloud-init.

[Test Case]

This requires an AWS instance that has Instance Store volumes
configured, and I suggest using c5d.large instances.

I have built a test kernel for bionic linux-aws, version
4.15.0-1043.45+hf240347v20190828b2 which is available from here:

https://launchpad.net/~mruffell/+archive/ubuntu/sf240347-kernel

Install the kernel with the below:

1) sudo add-apt-repository ppa:mruffell/sf240347-kernel
2) sudo apt-get update

Modify grub to boot it, this kernel is 1043, and current is 1044, so it
will likely be "1>2" in grub config:

3) sudo vim /etc/default/grub
Change GRUB_DEFAULT=0 to GRUB_DEFAULT="1>2"
4) sudo update-grub
5) reboot

Once system is up, check kernel version:
6) uname -rv
4.15.0-1043-aws #45+hf240347v20190828b2-Ubuntu SMP Wed Aug 28 06:08:21 UTC 2019

Verify that we have two nvme disks, one EBS and one Instance Store:

7) lsblk 
Should have two disks, normally nvme0 and nvme1.

See what device is what:

8) sudo udevadm info --attribute-walk /dev/nvme0
For me, ATTR{model} is "Amazon Elastic Block Store"

9) sudo udevadm info --attribute-walk /dev/nvme1
For me, ATTR{model} is "Amazon EC2 NVMe Instance Storage"

Look at the two timeouts (Note no udev rule yet):

10) cat /sys/block/{nvme0n1,nvme1n1}/queue/io_timeout
4294966296
4294966296

Now we deploy the udev rule:
Place the following line in /lib/udev/rules.d/66-aws-io-timeout.rules

KERNEL=="nvme[0-9]*n[0-9]*", ENV{DEVTYPE}=="disk", ATTRS{model}=="Amazon
EC2 NVMe Instance Storage", ATTR{queue/io_timeout}="30000"

Now trigger udev rules:
11) sudo udevadm trigger

Look at the timeouts now:

12) cat /sys/block/{nvme0n1,nvme1n1}/queue/io_timeout
4294966296
30000

[Regression Potential]

Regression potential is low since we are adding a udev rule which
applies only to AWS instances, and only for instances which support
Instance Store devices.

The only thing being modified is the device timeout and the udev rule is
robust to device reordering as it goes by model attr information.

When the udev rule is used with unpatched kernels, nothing happens since
the sysfs entry does not exist, and no errors or the like are reported.

[Other Info]
cloud-init appears to carry azure specific udev rules, which makes me think 
that cloud-init is the right place for this requested udev rule to live.

** Affects: cloud-init (Ubuntu)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: cloud-init (Ubuntu Xenial)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: cloud-init (Ubuntu Bionic)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: cloud-init (Ubuntu Disco)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress

** Affects: cloud-init (Ubuntu Eoan)
     Importance: Medium
     Assignee: Matthew Ruffell (mruffell)
         Status: In Progress


** Tags: sts

** Also affects: cloud-init (Ubuntu Disco)
   Importance: Undecided
       Status: New

** Also affects: cloud-init (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: cloud-init (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Also affects: cloud-init (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: cloud-init (Ubuntu Xenial)
       Status: New => In Progress

** Changed in: cloud-init (Ubuntu Bionic)
       Status: New => In Progress

** Changed in: cloud-init (Ubuntu Disco)
       Status: New => In Progress

** Changed in: cloud-init (Ubuntu Eoan)
       Status: New => In Progress

** Changed in: cloud-init (Ubuntu Xenial)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: cloud-init (Ubuntu Bionic)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: cloud-init (Ubuntu Disco)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Changed in: cloud-init (Ubuntu Eoan)
     Assignee: (unassigned) => Matthew Ruffell (mruffell)

** Tags added: sts

** Changed in: cloud-init (Ubuntu Xenial)
   Importance: Undecided => Medium

** Changed in: cloud-init (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: cloud-init (Ubuntu Disco)
   Importance: Undecided => Medium

** Changed in: cloud-init (Ubuntu Eoan)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1842562

Title:
  AWS: Add udev rule to set Instance Store device IO timeouts

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1842562/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to