Hello, Currently testing 9.0.15-0rc1 on a 3 node PVE cluster.
Pkg versions: ------------------ cat /proc/drbd version: 9.0.15-0rc1 (api:2/proto:86-114) GIT-hash: fc844fc366933c60f7303694ca1dea734dcb39bb build by root@pve1, 2018-07-23 18:47:08 Transports (api:16): tcp (9.0.15-0rc1) ii python-drbdmanage 0.99.18-1 ii drbdmanage-proxmox 2.2-1 ii drbd-utils 9.5.0-1 --------------------- Resource=vm-122-disk-1 Replica count=3 PVE nodes=pve1,pve2,pve3 Resource is active on pve2 (Primary), the rest two nodes (pve1,pve2) are Secondary. Tried to live migrate the VM from pve2 to pve3 and the process stuck just before starting. By inspecting dmesg on both nodes (pve2,pve3), I get the following crash.. pve2 (Primary) node: https://privatebin.net/?fb5435a42b431af2#4xZpd9D5bYnB000+H3K0noZmkX20fTwGSziv5oO/Zlg= pve3(Secondary)node: https://privatebin.net/?d3b1638fecb6728f#2StXbwDPT0JlFUKf686RJiR+4hl52jEmmij2UTtnSjs= Cancelled the migration, but it now it's impossible to change the state of the DRBD resource (vm-122-disk-1), in any way (switch from Primary to Secondary, Disconnect, bring down the resource etc) on pve3 or pve2. root@pve3:~# drbdadm down vm-122-disk-1 vm-122-disk-1: State change failed: (-12) Device is held open by someone additional info from kernel: failed to demote Command 'drbdsetup down vm-122-disk-1' terminated with exit code 11 Can't find any apparent process locking the specific resource on pve3 by using lsof. Is there a way to recover from this without rebooting the each node ? Thanks
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
