[Kernel-packages] [Bug 1756311] [NEW] fstrim and discard operations take too long to complete

Alexandre Makoto Tanno Fri, 16 Mar 2018 04:51:07 -0700

Public bug reported:

1-) Ubuntu Release : Ubuntu 14.04.5 LTS


2-) linux-image-3.13.0-143-generic and linux-image-4.4.0-1014-aws

3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should not
take more than some seconds to complete.

4-) Formating the raid0 array with xfs took around 2 hours. Running
fstrim -v on the mount point mounted on top of the raid array took
around 2 hours.

How to reproduce the issue:

- Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 14.04.5 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
- Compose a raid0 array with the following command :

 # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
/dev/nvme0n1 /dev/nvme1n1

- When trying to format the raid0 array ( /dev/md0 ) using xfs it takes
around 2 hours to complete. I tried other AMIs like RHEL7, CentOS7 and
Ubuntu 18.04 and the time needed was around 2 seconds.

root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

real    120m45.725s
user    0m0.000s
sys     0m18.248s

- Running fstrim -v on a filesystem mounted on top of /dev/md0 can take
around 2 hours to complete. With other AMIs like RHEL7, CentOS7 and
Ubuntu 18.04 and the time needed was around 2 seconds.

- When I try the same with any of the nvme SSD devices alone, let's say
/dev/nvme0n1, the issue doesn't happen.

- I tried to replicate this issue using LVM and striping, fstrim and
mkfs.xfs, the tasks complete without taking hours :

root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
  Physical volume "/dev/nvme0n1" successfully created

root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
  Physical volume "/dev/nvme1n1" successfully created

root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
  Volume group "raid0" successfully created

root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 100%FREE 
raid0 /dev/nvme0n1 /dev/nvme1n1
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.

root@ip-172-31-27-69:~# vgchange -ay
  1 logical volume(s) in volume group "raid0" now active

root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
  LV    VG    Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
  lvol0 raid0 -wi-a----- 3.46t
htop
root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
meta-data=/dev/raid0/lvol0       isize=512    agcount=32, agsize=28991664 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0
data     =                       bsize=4096   blocks=927733248, imaxpct=5
         =                       sunit=16     swidth=32 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=453008, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

real    0m2.926s
user    0m0.180s
sys     0m0.000s


root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

root@ip-172-31-27-69:~# time fstrim -v /mnt
/mnt: 3.5 TiB (3798138650624 bytes) trimmed

real    0m1.794s
user    0m0.000s
sys     0m0.000s

So the issue only happens when using nvme and md to compose the raid0
array.

Bellow follows some information that may be useful:

started formating the md array with mkfs.xfs. Process looks hanged.

root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
root       1693 12.0  0.0  12728   968 pts/1    D+   07:54   0:03 mkfs.xfs 
/dev/md0

PID 1693 is in uninterruptible sleep ( D )

Looking at /proc/7965/stack

root@ip-172-31-24-66:~# cat /proc/1693/stack
[<ffffffff8134d8c2>] blkdev_issue_discard+0x232/0x2a0
[<ffffffff813524bd>] blkdev_ioctl+0x61d/0x7d0
[<ffffffff811ff6f1>] block_ioctl+0x41/0x50
[<ffffffff811d89b3>] do_vfs_ioctl+0x2e3/0x4d0
[<ffffffff811d8c21>] SyS_ioctl+0x81/0xa0
[<ffffffff81748030>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff


Looking at the stack, looks like it's hanged on a discard operation

root@ip-172-31-24-66:~# ps -flp 1693
F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 D root       1693   1682  2  80   0 -  3182 blkdev 07:54 pts/1    00:00:03 
mkfs.xfs /dev/md0


root@ip-172-31-24-66:~# cat /proc/1693/wchan
blkdev_issue_discard

Process stuck with function --> blkdev_issue_discard

** Affects: linux-aws (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1756311

Title:
  fstrim and discard operations take too long to complete

Status in linux-aws package in Ubuntu:
  New

Bug description:
  1-) Ubuntu Release : Ubuntu 14.04.5 LTS

  2-) linux-image-3.13.0-143-generic and linux-image-4.4.0-1014-aws

  3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should
  not take more than some seconds to complete.

  4-) Formating the raid0 array with xfs took around 2 hours. Running
  fstrim -v on the mount point mounted on top of the raid array took
  around 2 hours.

  How to reproduce the issue:

  - Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 14.04.5 AMI ( 
ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS 
root volume and two 1.9T SSD drives that are presented to the instance using 
the nvme driver.
  - Compose a raid0 array with the following command :

   # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2
  /dev/nvme0n1 /dev/nvme1n1

  - When trying to format the raid0 array ( /dev/md0 ) using xfs it
  takes around 2 hours to complete. I tried other AMIs like RHEL7,
  CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

  root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

  real    120m45.725s
  user    0m0.000s
  sys     0m18.248s

  - Running fstrim -v on a filesystem mounted on top of /dev/md0 can
  take around 2 hours to complete. With other AMIs like RHEL7, CentOS7
  and Ubuntu 18.04 and the time needed was around 2 seconds.

  - When I try the same with any of the nvme SSD devices alone, let's
  say /dev/nvme0n1, the issue doesn't happen.

  - I tried to replicate this issue using LVM and striping, fstrim and
  mkfs.xfs, the tasks complete without taking hours :

  root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
    Physical volume "/dev/nvme0n1" successfully created

  root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
    Physical volume "/dev/nvme1n1" successfully created

  root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
    Volume group "raid0" successfully created

  root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 
100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
    Using default stripesize 64.00 KiB.
    Logical volume "lvol0" created.

  root@ip-172-31-27-69:~# vgchange -ay
    1 logical volume(s) in volume group "raid0" now active

  root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

  root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
    LV    VG    Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
Convert
    lvol0 raid0 -wi-a----- 3.46t
  htop
  root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
  meta-data=/dev/raid0/lvol0       isize=512    agcount=32, agsize=28991664 blks
           =                       sectsz=512   attr=2, projid32bit=1
           =                       crc=1        finobt=1, sparse=0
  data     =                       bsize=4096   blocks=927733248, imaxpct=5
           =                       sunit=16     swidth=32 blks
  naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
  log      =internal log           bsize=4096   blocks=453008, version=2
           =                       sectsz=512   sunit=16 blks, lazy-count=1
  realtime =none                   extsz=4096   blocks=0, rtextents=0

  real    0m2.926s
  user    0m0.180s
  sys     0m0.000s

  
  root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

  root@ip-172-31-27-69:~# time fstrim -v /mnt
  /mnt: 3.5 TiB (3798138650624 bytes) trimmed

  real    0m1.794s
  user    0m0.000s
  sys     0m0.000s

  So the issue only happens when using nvme and md to compose the raid0
  array.

  Bellow follows some information that may be useful:

  started formating the md array with mkfs.xfs. Process looks hanged.

  root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
  root       1693 12.0  0.0  12728   968 pts/1    D+   07:54   0:03 mkfs.xfs 
/dev/md0

  PID 1693 is in uninterruptible sleep ( D )

  Looking at /proc/7965/stack

  root@ip-172-31-24-66:~# cat /proc/1693/stack
  [<ffffffff8134d8c2>] blkdev_issue_discard+0x232/0x2a0
  [<ffffffff813524bd>] blkdev_ioctl+0x61d/0x7d0
  [<ffffffff811ff6f1>] block_ioctl+0x41/0x50
  [<ffffffff811d89b3>] do_vfs_ioctl+0x2e3/0x4d0
  [<ffffffff811d8c21>] SyS_ioctl+0x81/0xa0
  [<ffffffff81748030>] system_call_fastpath+0x1a/0x1f
  [<ffffffffffffffff>] 0xffffffffffffffff

  
  Looking at the stack, looks like it's hanged on a discard operation

  root@ip-172-31-24-66:~# ps -flp 1693
  F S UID         PID   PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME 
CMD
  4 D root       1693   1682  2  80   0 -  3182 blkdev 07:54 pts/1    00:00:03 
mkfs.xfs /dev/md0

  
  root@ip-172-31-24-66:~# cat /proc/1693/wchan
  blkdev_issue_discard

  Process stuck with function --> blkdev_issue_discard

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1756311/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1756311] [NEW] fstrim and discard operations take too long to complete

Reply via email to