[Kernel-packages] [Bug 1871874] Re: lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure

Nick Niehoff Fri, 10 Apr 2020 12:59:26 -0700

Ryan,
   From the logs the concern is the device or resource busy from meesage:


Running command ['lvremove', '--force', '--force', 'vgk/sdklv'] with allowed 
return codes [0] (capture=False)
  device-mapper: remove ioctl on  (253:5) failed: Device or resource busy
  Logical volume "sdklv" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 
'vg_name,lv_name'] with allowed return codes [0] (capture=True)

  Curtin does not fail and the node successfully deploys.  This is in an
integration lab so these hosts (including maas) are stopped, MAAS is
reinstalled, and the systems are redeployed without any release or
option to wipe during a MAAS release.  Then MAAS deploys Bionic on these
hosts thinking they are completely new systems but in reality they still
have the old volumes configured.  MAAS configures the root disk but
nothing to the other disks which are provisioned through other
automation later.  The customer has correlated these to problems
configuring ceph after deployment.  I have requested further information
about exactly the state of the system when it ends up in this case.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1871874

Title:
  lvremove occasionally fails on nodes with multiple volumes and curtin
  does not catch the failure

Status in curtin package in Ubuntu:
  Incomplete
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  For example:

  Wiping lvm logical volume: /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi
  wiping 1M on /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi at offsets [0, -1048576]
  using "lvremove" on ceph-db-wal-dev-sdc/ceph-db-dev-sdi
  Running command ['lvremove', '--force', '--force', 
'ceph-db-wal-dev-sdc/ceph-db-dev-sdi'] with allowed return codes [0] 
(capture=False)
  device-mapper: remove ioctl on (253:14) failed: Device or resource busy
  Logical volume "ceph-db-dev-sdi" successfully removed

  On a node with 10 disks configured as follows:

  /dev/sda2 /
  /dev/sda1 /boot
  /dev/sda3 /var/log
  /dev/sda5 /var/crash
  /dev/sda6 /var/lib/openstack-helm
  /dev/sda7 /var
  /dev/sdj1 /srv

  sdb and sdc are used for BlueStore WAL and DB
  sdd, sde, sdf: ceph OSDs, using sdb
  sdg, sdh, sdi: ceph OSDs, using sdc

  across multiple servers this happens occasionally with various disks.
  It looks like this maybe a race condition maybe in lvm as curtin is
  wiping multiple volumes before lvm fails

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1871874/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1871874] Re: lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure

Reply via email to