>>> Dejan Muhamedagic <[email protected]> schrieb am 05.08.2011 um 14:18 in
Nachricht <[email protected]>:
> Hi,
>
> On Fri, Aug 05, 2011 at 01:55:25PM +0200, Ulrich Windl wrote:
> > Hi,
> >
> > we run a cluster that has about 30 LVM VGs that are monitored every minute
> with a timeout interval of 90s. Surprisingly even if the system is in nominal
> state, the LVM monitor times out.
> >
> > I suspect this has to do with multiple LVM commands being run in parallel
> like this:
> > # ps ax |grep vg
> > 2014 pts/0 D+ 0:00 vgs
> > 2580 ? D 0:00 vgdisplay -v NFS_C11_IO
> > 2638 ? D 0:00 vgck CBW_DB_BTD
> > 2992 ? D 0:00 vgdisplay -v C11_DB_Exe
> > 3002 ? D 0:00 vgdisplay -v C11_DB_15k
> > 4564 pts/2 S+ 0:00 grep vg
> > # ps ax |grep vg
> > 8095 ? D 0:00 vgck CBW_DB_Exe
> > 8119 ? D 0:00 vgdisplay -v C11_DB_FATA
> > 8194 ? D 0:00 vgdisplay -v NFS_SAP_Exe
> >
> > When I tried a "vgs" manually, it could not be suspended or killed, and it
> took more than 30 seconds to complete.
> >
> > Thus the LVM monitoring is quite useless as it is now (SLES 11 SP1 x86_64
> on a machine with lots of disks, RAM and CPUs).
>
> I guess that this is somehow related to the storage. Best to
> report directly to SUSE.
>
Hi!
I suspect that LVM uses an exclusive lock while examining the state. Basically
vgdisplay in Linux does a stupid thing: It always scanns all disks to find PVs.
As compared to HP-UX LVM, there it only scans the disks if you explicitly
request as vgscan. A simple vgdisplay will access kernel in-RAM structures, but
you can only vgdisplay VGs that are active (otherwise the kernel doesn't know
them). PVs for VGs are stored in a file there.
I don't think the disk system is the problem; it's the LVM implementation. A
very quick test series showed that vgdisplay for a named VG that exists takes
0.3 to 0.8 seconds, that's rather slow. And looking for a VG that does not
exist takes 0.8 to 1.5 seconds.
The system in question has 192 "SCSI disks" that are combined to 44 multipath
disks. About the half of those are combined to RAID1s, a few of those RAIDs are
partitioned. All RAIDs have a VG with at least one LV. This gives 72 device
mapper devices. Now if lvm searches on all those devices, it can take a while
to complete.
While playing I made an interesting observation: If you use jsut "vgdisplay" to
display all VGs, the command takes about 0.05s, but when you specify a name, it
takes about 0.7s. Finally when using awk to locate the desired VG, the command
isn't very much slower than without awk:
# time (vgdisplay | awk '$1 == "VG" && $2 == "Name" && $3 == "dd" { print $3 }')
real 0m0.082s
user 0m0.020s
sys 0m0.012s
# time (vgdisplay | awk '$1 == "VG" && $2 == "Name" && $3 == "sys" { print $3
}')
sys
real 0m0.098s
user 0m0.012s
sys 0m0.020s
# time vgdisplay sys
[...]
real 0m0.063s
user 0m0.020s
sys 0m0.004s
# time vgdisplay sysX
Volume group "sysX" not found
real 0m0.806s
user 0m0.012s
sys 0m0.060s
So the "status" as it's implemented now takes much longer to return "stopped"
than it takes to return "started". Maybe someone wants to have a look what
terrible things happen when a non-existing VG is specified for vgdisplay....
Regards,
Ulrich
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems