2013/4/22 Ulrich Windl <[email protected]>: >>>> Lars Marowsky-Bree <[email protected]> schrieb am 19.04.2013 um 18:46 in >>>> Nachricht > <[email protected]>: >> On 2013-04-19T16:27:14, Ulrich Windl <[email protected]> >> wrote: >> >> > Hello, >> > >> > Using OCFS2 on top of a cLVM-mirrored LV is an absolute no-go for SLES11 >> > SP2: >> >> Note that this is unrelated to OCFS2; cLVM2 mirroring is rather slow, >> since it communicates over the network to keep the dirty bitmaps and >> locks in sync. > > That is the question: If cLVM mirroring floods the communication channel, > both OCFS2 (which uses the same communication channel, DLM) will suffer, just > as the cluster stack will (talking about faultyrings then). > >> >> > First, while mirroring the LV ("only" 300GB) access to any of the involved >> devices is very slow, but it's not the I/O that's the limit, but something >> else; "communication" I suspect. >> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND >> > 14883 root RT 0 535m 11m 7808 R 42 0.0 34:35.24 0 corosync >> > 16493 root 20 0 33804 2168 1752 R 14 0.0 10:32.29 0 cmirrord >> > >> > Shouldn't a busy mirroring job have a "D" state instead of an "R" state, >> burning CPU? >> >> They're not in "D" because they are not waiting on disk IO, but have >> a lot of network IO and data structure maintenance to handle. > > Interesting: While flooding a Gb network, the acieved mirroring rate is only > about 60MB/s. But we are not mirroring through the network, but throuch > 4Gb/FC (fully redundant fabrics). > >> >> > So besides of the inefficient I/O with cLVM there are more issues: >> > 1) LVM should load-balance between the mirror legs >> >> It doesn't, because this simplifies the dirty logging. It always will >> write to leg 1 first, hence all read requests can always be satisfied >> from leg 1 without the need to cluster-wide sync if leg 1 and leg 2 are >> already in sync in the IO paths. > > See the performance of MD-RAID for a movtivation: MD-RAID is much faster. > >> >> > 2) LVM should use a leg-internal bitmap to resynchronize the mirror in a >> non-stupid way >> >> It does use a bitmap for syncing, if you created the lvmirror with a >> persistent mirrorlog. > > That design is broken: if you have two separate storage systems in two > locations, where do you put the bitmap? In HP-UX (similar as MD-RAID) each PV > had ist own bitmap; with Linux-LVM you need a _third_ device to store the > bitmap. That's nonsense. > > >> >> > 3) LVM should mirror the more recent mirror leg to the outdated mirror leg, >> not use a fixed direction. >> >> The only situation where this matters is a split brain combined with >> split IO. That's a situation that even DRBD doesn't handle well, and >> the resolution that LVM2 mirroring implements is as valid as any. > > Yes, DRBD dual-primary also failed in out scenario: Manual repair was needed. > The primary idea of mirroring is that systems keep running of one mirror leg > fails. And the necessary condition for practical use in a HA environemnt is > that once the failed leg returns (assuming I/O outage) the systems still keep > running while the data are being synchronized on the stale leg. cLVM brings > the system to a practical stand-still in this situation. > >> >> > So my advice is: Don't use it (for SLES11 SP2). >> >> You should not use it if performance is your primary goal for using it, >> no. > > See above. I can only assume cLVM was tested in a "toy environment" with > either tiny or extremely slow disks so that the disk limited the mirroring > speed. > >> >> > I'm somewhat >> > displeased about the situation, because I had a support request asking >> > exactly whether this setup is a supported configuration, and it was >> > confirmed. >> >> It *is* supported, but cLVM2 mirroring has constraints, especially with >> regard to performance and flexibility. >> >> If you can avoid the need for a concurrent cluster mirror, do so: use >> SAN-based mirroring, use md raid1 if active/passive access is >> sufficient, consider building an iSCSI server using Raid1 to re-export >> via iSCSI for your concurrent IO needs, consider using DRBD, use cLVM2 >> mirroring but with local activation, etc. They all, alas, have >> trade-offs. >> >> Cluster concurrency is a hard problem. cLVM2 mirroring performance is >> certainly pretty close to the top of our priority lists, but the battle >> is not won in a day. > > Yes, I had complained about the massive logging of cLVM (which showed that > it's communication quite a lot (I'd say: way too much)), and the solution > being applied seems to be disabling logging. So the extensive communication > still happens. > >> >> > Now the first proposal regarding the terrible performance was to _try_ >> > SLES11 SP3 beta... >> >> The CPU overhead will have improved some, but the basic design of cLVM2 >> mirroring hasn't changed a lot. >> >> This is the same upstream and in all distributions, it is not SLES >> specific. > > There were some rumours that Redhat's LVM is ahead of SUSE's by at least one > generation...
Just out of curiosity, sources??. cLVM is not included upstream? > > Regards, > Ulrich > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
