gfs2 hangs if a node crashes - SOLVED

William Seligman Tue, 27 Mar 2012 08:31:32 -0700

On 3/27/12 4:52 AM, emmanuel segura wrote:

> So now your cluster it's OK?


*Laughs* No! There's another problem I have to solve. But it's completely
unrelated to this one. I'll work on it some more, and if I can't solve it I'll
start a new thread.

Thanks for asking, Emmanuel. (I want to prove I can spell your name correctly!)

> Il giorno 27 marzo 2012 00:33, William Seligman <[email protected]
>> ha scritto:
> 
>> On 3/26/12 5:31 PM, William Seligman wrote:
>>> On 3/26/12 5:17 PM, William Seligman wrote:
>>>> On 3/26/12 4:28 PM, emmanuel segura wrote:
>>
>>>>> and i suggest you to start clvmd at boot time
>>>>>
>>>>> chkconfig clvmd on
>>>>
>>>> I'm afraid this doesn't work. It's as I predicted; when gfs2 starts I get:
>>>>
>>>> Mounting GFS2 filesystem (/usr/nevis): invalid device path 
>>>> "/dev/mapper/ADMIN-usr"
>>>>                                                            [FAILED]
>>>>
>>>> ... and so on, because the ADMIN volume group was never loaded by 
>>>> clvmd. Without a "vgscan" in there somewhere, the system can't see the
>>>> volume groups on the drbd resource.
>>>
>>> Wait a second... there's an ocf:heartbeat:LVM resource! Testing...
>>
>> Emannuel, you did it!
>>
>> For the sake of future searches, and possibly future documentation, let me 
>> start with my original description of the problem:
>>
>>> I'm setting up a two-node cman+pacemaker+gfs2 cluster as described in
>>> "Clusters From Scratch." Fencing is through forcibly rebooting a node by
>>> cutting and restoring its power via UPS.
>>> 
>>> My fencing/failover tests have revealed a problem. If I gracefully turn
>>> off one node ("crm node standby"; "service pacemaker stop"; "shutdown -r
>>> now") all the resources transfer to the other node with no problems. If I
>>> cut power to one node (as would happen if it were fenced), the lsb::clvmd
>>> resource on the remaining node eventually fails. Since all the other
>>> resources depend on clvmd, all the resources on the remaining node stop
>>> and the cluster is left with nothing running.
>>> 
>>> I've traced why the lsb::clvmd fails: The monitor/status command
>>> includes "vgdisplay", which hangs indefinitely. Therefore the monitor
>>> will always time-out.
>>> 
>>> So this isn't a problem with pacemaker, but with clvmd/dlm: If a node is
>>> cut off, the cluster isn't handling it properly. Has anyone on this list
>>> seen this before? Any ideas?
>>>
>>> Details:
>>>
>>> versions:
>>> Redhat Linux 6.2 (kernel 2.6.32)
>>> cman-3.0.12.1
>>> corosync-1.4.1
>>> pacemaker-1.1.6
>>> lvm2-2.02.87
>>> lvm2-cluster-2.02.87
>>
>> The problem is that clvmd on the main node will hang if there's a 
>> substantive period of time during which the other node returns running cman
>> but not clvmd. I never tracked down why this happens, but there's a
>> practical solution: minimize any interval for which that would be true. To
>> ensure this, take clvmd outside the resource manager's control:
>>
>> chkconfig cman on
>> chkconfig clvmd on
>> chkconfig pacemaker on
>>
>> On RHEL6.2, these services will be started in the above order; clvmd will 
>> start within a few seconds after cman.
>> 
>> Here's my cluster.conf <http://pastebin.com/GUr0CEgZ> and the output of 
>> "crm configure show" <http://pastebin.com/f9D4Ui5Z>. The key lines from
>> the latter are:
>>
>> primitive AdminDrbd ocf:linbit:drbd \
>>        params drbd_resource="admin"
>> primitive AdminLvm ocf:heartbeat:LVM \
>>        params volgrpname="ADMIN" \
>>        op monitor interval="30" timeout="100" depth="0"
>> primitive Gfs2 lsb:gfs2
>> group VolumeGroup AdminLvm Gfs2
>> ms AdminClone AdminDrbd \
>>        meta master-max="2" master-node-max="1" \
>>        clone-max="2" clone-node-max="1" \
>>        notify="true" interleave="true"
>> clone VolumeClone VolumeGroup \
>>        meta interleave="true"
>> colocation Volume_With_Admin inf: VolumeClone AdminClone:Master
>> order Admin_Before_Volume inf: AdminClone:promote VolumeClone:start
>>
>> What I learned: If one is going to extend the example in "Clusters From 
>> Scratch" to include logical volumes, one must start clvmd at boot time, and
>> include any volume groups in ocf:heartbeat:LVM resources that start before
>> gfs2.
>> 
>> Note the long timeout on the ocf:heartbeat:LVM resource. This is a good 
>> idea because, during the boot of the crashed node, there'll still be an 
>> interval of a few seconds when cman will be running but clvmd won't be.
>> During my tests, the LVM monitor would fail if it checked during that
>> interval with a timeout that was shorter than it took clvmd to start on the
>> crashed node. This was annoying; all resources dependent on AdminLvm would
>> be stopped until AdminLvm recovered (a few more seconds). Increasing the
>> timeout avoids this.
>> 
>> It also means that during any recovery procedure on the crashed node for 
>> which I turn off all the services, I have to minimize the interval between
>> the start of cman and clvmd if I've turned off services at boot; e.g.,
>>
>> service drbd start # ... and fix any split-brain problems or whatever
>> service cman start; service clvmd start # put on one line
>> service pacemaker start
>>
>> I thank everyone on this list who was patient with me as I pounded on this
>> problem for two weeks!

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://[email protected]
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes - SOLVED

Reply via email to