[DRBD-user] [DRBD-9.0.15-0rc1] Resource "stuck" during live migration

Yannis Milios Wed, 25 Jul 2018 12:57:32 -0700

Hello,

Currently testing 9.0.15-0rc1 on a 3 node PVE cluster.


Pkg versions:
------------------
cat /proc/drbd
version: 9.0.15-0rc1 (api:2/proto:86-114)
GIT-hash: fc844fc366933c60f7303694ca1dea734dcb39bb build by root@pve1,
2018-07-23 18:47:08
Transports (api:16): tcp (9.0.15-0rc1)
ii  python-drbdmanage             0.99.18-1
ii  drbdmanage-proxmox            2.2-1
ii  drbd-utils                    9.5.0-1
---------------------
Resource=vm-122-disk-1
Replica count=3
PVE nodes=pve1,pve2,pve3
Resource is active on pve2 (Primary), the rest two nodes (pve1,pve2) are
Secondary.

Tried to live migrate the VM from pve2 to pve3 and the process stuck just
before starting. By inspecting dmesg on both nodes (pve2,pve3), I get the
following crash..


pve2 (Primary) node:
https://privatebin.net/?fb5435a42b431af2#4xZpd9D5bYnB000+H3K0noZmkX20fTwGSziv5oO/Zlg=

pve3(Secondary)node:
https://privatebin.net/?d3b1638fecb6728f#2StXbwDPT0JlFUKf686RJiR+4hl52jEmmij2UTtnSjs=

Cancelled the migration, but it now it's impossible to change the state of
the DRBD resource (vm-122-disk-1), in any way (switch from Primary to
Secondary, Disconnect, bring down the resource etc) on pve3 or pve2.

root@pve3:~# drbdadm down vm-122-disk-1
vm-122-disk-1: State change failed: (-12) Device is held open by someone
additional info from kernel:
failed to demote
Command 'drbdsetup down vm-122-disk-1' terminated with exit code 11

Can't find any apparent process locking the specific resource on pve3 by
using lsof.

Is there a way to recover from this without rebooting the each node ?

Thanks

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] [DRBD-9.0.15-0rc1] Resource "stuck" during live migration

Reply via email to