Hi, we had the same problems with Debian Wheezy, LVM2 and DRBD. But this seems not DRBD related. It seems to be some problem between lvm and udevd.
See: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549691 Stopping udevd before taking the snapshot and starting after removing the snapshot solved the problem for us. It's only a workaround, but it works for us. Regards Urban Am 26.07.2013 17:14, schrieb Frank Steinborn:
Hi, we are a bit further in debugging this. We installed a DELL PowerEdge r620 (same hardware as used in our DRBD-cluster where this problem happens). As noone in this thread brought DRBD into play, I didn't expect any interaction with it related to this bug. However, we were not able to reproduce with just LVM2 (eg. configure LV, do IO in LV, remove LV, hang.) So we installed a second machine and put DRBD on top of the LVs. And voila, as soon as we create a snapshot of the LV where DRBD is on top and remove this snapshot it fails ca. 1/3 of the time. Some facts: root@drbd-primary:~# lvremove --force /dev/vg0/lv0-snap Unable to deactivate open vg0-lv0--snap-cow (254:3) Failed to resume lv0-snap. libdevmapper exiting with 1 device(s) still suspended. After this, "dmsetup info" gives the following output: <<< snip >>> Name: vg0-lv0--snap State: ACTIVE Read Ahead: 256 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 254, 1 Number of targets: 1 UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ Name: vg0-lv0-real State: ACTIVE Read Ahead: 0 Tables present: LIVE Open count: 1 Event number: 0 Major, minor: 254, 2 Number of targets: 1 UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j-real Name: vg0-lv0 State: SUSPENDED Read Ahead: 256 Tables present: LIVE & INACTIVE Open count: 2 Event number: 0 Major, minor: 254, 0 Number of targets: 1 UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYC3ppjt1CZ3AcZR2hNz1VT5CHdM4RR32j Name: vg0-lv0--snap-cow State: ACTIVE Read Ahead: 0 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 254, 3 Number of targets: 1 UUID: LVM-M0Z897O16CAiYbSivOzgSn0M9Ae9TdoYy4WFhwy43CZA1g7zKFGF915pLAOIPvFZ-cow <<< snap >>> As you can see, the real LV with DRBD on top is now in state SUSPENDED - which causes the cluster to be non-functional as IO operations stall on both the primary and secondary node until one does "dmsetup resume /dev/vg0/lv0". Another interesting issue we've seen: after doing "dmsetup resume /dev/vg0/lv0", lv0-snap doesn't appear to be a snapshot anymore, given the output of lvs (lv0-snap has no origin anymore): LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert lv0 vg0 -wi-ao-- 200.00g lv0-snap vg0 -wi-a--- 40.00g Some miscellaneous notes: * It _feels_ to only happen when the snapshot is filled at least something around 50-60%. * We can trigger something like this even without DRBD. When triggered however, the LV will never end up in SUSPENDED state and a second try of lvremove will always succeed. Thats all we have so far. I already had a private conversation with [email protected] <mailto:[email protected]> on this and we will (probably) provide him remote access on this system as soon as we have the setup reachable from the outside. Please let me know if I can provide any more information to get this fixed. I put drbd-dev in cc, maybe someone over there has an idea on this? @drbd-dev: system is debian wheezy, w/ drbd 8.3.11, lvm2 2.02.95. Thanks, Frank
-- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

