Hi,

11.11.2011 10:25, Ulrich Windl wrote:
> Hi!
> 
> I found some obscure problem having to do with LVM multipathing and
> hot-plugged disks:
> 
> I have written some RAs that support "hotplugging of SAN disks" via
> NPIV (N_Port ID Virtualization) and addition and removal of multipath
> maps. On to of that is LVM and filesystems.
> 
> So fa, so good. However I discovered a problem when multiple
> resources are shut down in parallel: The LVM-stuff (like vgdisplay)
> access all disks that are around, and not just the disks that matter.
> This may lead to a race condition where one resource group stops an
> LVM monitor, the shuts down the corresponding multipath, and finally
> the NPIV-device (SCSI unplug). Unfortunately during that another LVM
> command may access the disks that are clear for removal.
> 
> I don't know what exactly happened, bu tthe result was that several
> vgdisplay commands did hang (unkillable with kill -9 even), multipath
> commands did hang (device busy through LVM?), and the device could
> not be removed. As it seems there is some rather global lock involved
> that makes more and more command hang.


I experienced the same problem, and solution was three-step:
1. Exclude all LVs from being scanned for VGs. Actually I automatically
edit lvm.conf ("filter" line there). adding new device when appears in
the system. Default policy is r/.*/. Otherwise lvm accesses/scans all
not-filtered block devices every time you run lvm commands. If you have
1000 LVs on some VG, then they all will be scanned on every request
after you activate that VG. And that will slowdown subsequent LVM
commands dramatically. Every request to that LVs consumes some IO, and
if you're IO-bound, that will take really much time.
2. Raise scheduling priority of dlm_controld, dlm kernel threads (not
sure this has some effect) and clvmd (I run clustered LVM) and priority
of LVM commands in the RA (with chrt -r 10).
3. I use timeout(1) to run LVM commands from RA because yes, LVM
commands may hang under high IO load. And I re-try the same command on
timeout.

After I did that, I performed stress testing - consumed all available IO
with dozens of disktest instances, and my cluster remained alive, LVM RA
works as expected. Of course I raised timeouts for RA ops.

And, I use my own RA which specifically does not run lvm commands on
monitor op, just [ -d /dev/VG ] or [ -e /dev/VG/LV ], which is
absolutely enough.

btw, do you use clvm? Unkillable processes sometimes appear when dlm
lockspace stuck. If yes, please look for kern_stop for clvmd lockspace
in dlm_tool ls.

One side note for clvmd - it should be forced to use corosync stack
instead of openais. I saw big problems with LCK which is used by default
if openais modules are loaded by corosync, and guys from
corosync/openais list said that LCK is too experimental and not heavily
tested.

Hope this helps,
Vladislav

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to