gfs2 hangs if a node crashes - SOLVED

William Seligman Mon, 26 Mar 2012 15:33:39 -0700

On 3/26/12 5:31 PM, William Seligman wrote:
> On 3/26/12 5:17 PM, William Seligman wrote:
>> On 3/26/12 4:28 PM, emmanuel segura wrote:


>>> and i suggest you to start clvmd at boot time
>>>
>>> chkconfig clvmd on
>>
>> I'm afraid this doesn't work. It's as I predicted; when gfs2 starts I get:
>>
>> Mounting GFS2 filesystem (/usr/nevis): invalid device path 
>> "/dev/mapper/ADMIN-usr"
>>                                                            [FAILED]
>>
>> ... and so on, because the ADMIN volume group was never loaded by clvmd. 
>> Without
>> a "vgscan" in there somewhere, the system can't see the volume groups on the
>> drbd resource.
> 
> Wait a second... there's an ocf:heartbeat:LVM resource! Testing...

Emannuel, you did it!

For the sake of future searches, and possibly future documentation, let me start
with my original description of the problem:

> I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in 
> "Clusters
> From Scratch." Fencing is through forcibly rebooting a node by cutting and
> restoring its power via UPS.
> 
> My fencing/failover tests have revealed a problem. If I gracefully turn off 
> one
> node ("crm node standby"; "service pacemaker stop"; "shutdown -r now") all the
> resources transfer to the other node with no problems. If I cut power to one
> node (as would happen if it were fenced), the lsb::clvmd resource on the
> remaining node eventually fails. Since all the other resources depend on 
> clvmd,
> all the resources on the remaining node stop and the cluster is left with
> nothing running.
> 
> I've traced why the lsb::clvmd fails: The monitor/status command includes
> "vgdisplay", which hangs indefinitely. Therefore the monitor will always 
> time-out.
> 
> So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is cut
> off, the cluster isn't handling it properly. Has anyone on this list seen this
> before? Any ideas?
> 
> Details:
> 
> versions:
> Redhat Linux 6.2 (kernel 2.6.32)
> cman-3.0.12.1
> corosync-1.4.1
> pacemaker-1.1.6
> lvm2-2.02.87
> lvm2-cluster-2.02.87

The problem is that clvmd on the main node will hang if there's a substantive
period of time during which the other node returns running cman but not clvmd. I
never tracked down why this happens, but there's a practical solution: minimize
any interval for which that would be true. To ensure this, take clvmd outside
the resource manager's control:

chkconfig cman on
chkconfig clvmd on
chkconfig pacemaker on

On RHEL6.2, these services will be started in the above order; clvmd will start
within a few seconds after cman.

Here's my cluster.conf <http://pastebin.com/GUr0CEgZ> and the output of "crm
configure show" <http://pastebin.com/f9D4Ui5Z>. The key lines from the latter 
are:

primitive AdminDrbd ocf:linbit:drbd \
        params drbd_resource="admin"
primitive AdminLvm ocf:heartbeat:LVM \
        params volgrpname="ADMIN" \
        op monitor interval="30" timeout="100" depth="0"
primitive Gfs2 lsb:gfs2
group VolumeGroup AdminLvm Gfs2
ms AdminClone AdminDrbd \
        meta master-max="2" master-node-max="1" \
        clone-max="2" clone-node-max="1" \
        notify="true" interleave="true"
clone VolumeClone VolumeGroup \
        meta interleave="true"
colocation Volume_With_Admin inf: VolumeClone AdminClone:Master
order Admin_Before_Volume inf: AdminClone:promote VolumeClone:start

What I learned: If one is going to extend the example in "Clusters From Scratch"
to include logical volumes, one must start clvmd at boot time, and include any
volume groups in ocf:heartbeat:LVM resources that start before gfs2.

Note the long timeout on the ocf:heartbeat:LVM resource. This is a good idea
because, during the boot of the crashed node, there'll still be an interval of a
few seconds when cman will be running but clvmd won't be. During my tests, the
LVM monitor would fail if it checked during that interval with a timeout that
was shorter than it took clvmd to start on the crashed node. This was annoying;
all resources dependent on AdminLvm would be stopped until AdminLvm recovered (a
few more seconds). Increasing the timeout avoids this.

It also means that during any recovery procedure on the crashed node for which I
turn off all the services, I have to minimize the interval between the start of
cman and clvmd if I've turned off services at boot; e.g.,

service drbd start # ... and fix any split-brain problems or whatever
service cman start; service clvmd start # put on one line
service pacemaker start

I thank everyone on this list who was patient with me as I pounded on this
problem for two weeks!
-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://[email protected]
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes - SOLVED

Reply via email to