Hi, On Fri, Nov 11, 2011 at 08:25:48AM +0100, Ulrich Windl wrote: > Hi! > > I found some obscure problem having to do with LVM multipathing and > hot-plugged disks: > > I have written some RAs that support "hotplugging of SAN disks" via NPIV > (N_Port ID Virtualization) and addition and removal of multipath maps. On to > of that is LVM and filesystems. > > So fa, so good. However I discovered a problem when multiple resources are > shut down in parallel: The LVM-stuff (like vgdisplay) access all disks that > are around, and not just the disks that matter. This may lead to a race > condition where one resource group stops an LVM monitor, the shuts down the > corresponding multipath, and finally the NPIV-device (SCSI unplug). > Unfortunately during that another LVM command may access the disks that are > clear for removal. > > I don't know what exactly happened, bu tthe result was that several vgdisplay > commands did hang (unkillable with kill -9 even), multipath commands did hang > (device busy through LVM?), and the device could not be removed. As it seems > there is some rather global lock involved that makes more and more command > hang. > > In the end the machine needed a hard reset to recover.
I doubt that it's possible to handle this at the RA layer. At least that's what it sounds like. vgdisplay should arguably not hang in this situation, it's a r/o command. I'd suggest to open a bugzilla with your vendor. Thanks, Dejan > I know the HP-UX implementation of LVM where the kernel knows which disks > belong to which volume group once the VG is active. The any command related > to a specific VG only has to access the disks that are actually PVs of that > VG (if at all). In the Linux implementation, LVM commands seem to scan every > block device every time, causing odd effects. > > In another test I did, the wall-clock time for "lvs" increased by a factor > close to 1000 when there was significant load on a few disks. I'm talking > about execution times of more than 60 seconds for one single "lvs". Naturally > this causes a monitor failure for several resources, but the stop operation > would also time out, causing a (needless) node fence. > > The real reasons for that huge increase in execution time is still under > investigation, but it is absolutely repeatable on a machine with many (>50) > block devices, logs of RAM (>100G), huge filesystems (>1TB) and several huge > files (>100GB) that are being copied. > > Regards, > Ulrich > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
