gfs2 hangs if a node crashes

William Seligman Mon, 26 Mar 2012 14:17:36 -0700

On 3/26/12 4:28 PM, emmanuel segura wrote:
> Sorry Willian i can't post my config now because i'm at home now  not in my
> job
> 
> I think it's no a problem if clvm start before drbd, because clvm not
> needed and devices to start
> 
> This it's the point, i hope to be clear
> 
> The introduction of pacemaker in redhat cluster was thinked  for replace
> rgmanager not whole cluster stack
> 
> and i suggest you to start clvmd at boot time
> 
> chkconfig clvmd on


I'm afraid this doesn't work. It's as I predicted; when gfs2 starts I get:

Mounting GFS2 filesystem (/usr/nevis): invalid device path 
"/dev/mapper/ADMIN-usr"
                                                           [FAILED]

... and so on, because the ADMIN volume group was never loaded by clvmd. Without
a "vgscan" in there somewhere, the system can't see the volume groups on the
drbd resource.

> Sorry for my bad english :-) i can from a spanish country and all days i
> speak Italian

I'm sorry that I don't speak more languages! You're the one who's helping me;
it's my task to learn and understand. Certainly your English is better than my
French or Russian.

> Il giorno 26 marzo 2012 22:04, William Seligman <[email protected]
>> ha scritto:
> 
>> On 3/26/12 3:48 PM, emmanuel segura wrote:
>>> I know it's normal fence_node doesn't work because the request of fence
>>> must be redirect to pacemaker stonith
>>>
>>> I think call the cluster agents with rgmanager it's really ugly thing, i
>>> never seen a cluster like this
>>> ==============================================================
>>> If I understand "Pacemaker Explained" <http://bit.ly/GR5WEY> and how I'd
>>> invoke
>>> clvmd from cman <http://bit.ly/H6ZbKg>, the clvmd script that would be
>>> invoked
>>> by either HA resource manager is exactly the same: /etc/init.d/clvmd.
>>> ==============================================================
>>>
>>> clvm doesn't need to be called from rgmanger in the cluster configuration
>>>
>>> this the boot sequence of redhat daemons
>>>
>>> 1:cman, 2:clvm, 3:rgmanager
>>>
>>> and if you don't wanna use rgmanager you just replace rgmanager
>>
>> I'm sorry, but I don't think I understand what you're suggesting. Do you
>> suggest
>> that I start clvmd at boot? That won't work; clvmd won't see the volume
>> groups
>> on drbd until drbd is started and promoted to primary.
>>
>> May I ask you to post your own cluster.conf on pastebin.com so I can see
>> how you
>> do it? Along with "crm configure show" if that's relevant for your cluster?
>>
>>> Il giorno 26 marzo 2012 19:21, William Seligman <
>> [email protected]
>>>> ha scritto:
>>>
>>>> On 3/24/12 5:40 PM, emmanuel segura wrote:
>>>>> I think it's better you use clvmd with cman
>>>>>
>>>>> I don't now why you use the lsb script of clvm
>>>>>
>>>>> On Redhat clvmd need of cman and you try to running with pacemaker, i
>> not
>>>>> sure this is the problem but this type of configuration it's so strange
>>>>>
>>>>> I made it a virtual cluster with kvm and i not foud a problems
>>>>
>>>> While I appreciate the advice, it's not immediately clear that trying to
>>>> eliminate pacemaker would do me any good. Perhaps someone can
>> demonstrate
>>>> the
>>>> error in my reasoning:
>>>>
>>>> If I understand "Pacemaker Explained" <http://bit.ly/GR5WEY> and how
>> I'd
>>>> invoke
>>>> clvmd from cman <http://bit.ly/H6ZbKg>, the clvmd script that would be
>>>> invoked
>>>> by either HA resource manager is exactly the same: /etc/init.d/clvmd.
>>>>
>>>> If I tried to use cman instead of pacemaker, I'd be cutting myself off
>>>> from the
>>>> pacemaker features that cman/rgmanager does not yet have available,
>> such as
>>>> pacemaker's symlink, exportfs, and clonable IPaddr2 resources.
>>>>
>>>> I recognize I've got a strange problem. Given that fence_node doesn't
>> work
>>>> but
>>>> stonith_admin does, I strongly suspect that the problem is caused by the
>>>> behavior of my fencing agent, not the use of pacemaker versus rgmanager,
>>>> nor by
>>>> how clvmd is being started.
>>>>
>>>>> Il giorno 24 marzo 2012 13:09, William Seligman <
>>>> [email protected]
>>>>>> ha scritto:
>>>>>
>>>>>> On 3/24/12 4:47 AM, emmanuel segura wrote:
>>>>>>> How do you configure clvmd?
>>>>>>>
>>>>>>> with cman or with pacemaker?
>>>>>>
>>>>>> Pacemaker. Here's the output of 'crm configure show':
>>>>>> <http://pastebin.com/426CdVwN>
>>>>>>
>>>>>>> Il giorno 23 marzo 2012 22:14, William Seligman <
>>>>>> [email protected]
>>>>>>>> ha scritto:
>>>>>>>
>>>>>>>> On 3/23/12 5:03 PM, emmanuel segura wrote:
>>>>>>>>
>>>>>>>>> Sorry but i would to know if can show me your
>>>> /etc/cluster/cluster.conf
>>>>>>>>
>>>>>>>> Here it is: <http://pastebin.com/GUr0CEgZ>
>>>>>>>>
>>>>>>>>> Il giorno 23 marzo 2012 21:50, William Seligman <
>>>>>>>> [email protected]
>>>>>>>>>> ha scritto:
>>>>>>>>>
>>>>>>>>>> On 3/22/12 2:43 PM, William Seligman wrote:
>>>>>>>>>>> On 3/20/12 4:55 PM, Lars Ellenberg wrote:
>>>>>>>>>>>> On Fri, Mar 16, 2012 at 05:06:04PM -0400, William Seligman
>> wrote:
>>>>>>>>>>>>> On 3/16/12 12:12 PM, William Seligman wrote:
>>>>>>>>>>>>>> On 3/16/12 7:02 AM, Andreas Kurz wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> s----- ... DRBD suspended io, most likely because of it's
>>>>>>>>>>>>>>> fencing-policy. For valid dual-primary setups you have to use
>>>>>>>>>>>>>>> "resource-and-stonith" policy and a working "fence-peer"
>>>> handler.
>>>>>>>> In
>>>>>>>>>>>>>>> this mode I/O is suspended until fencing of peer was
>> succesful.
>>>>>>>>>> Question
>>>>>>>>>>>>>>> is, why the peer does _not_ also suspend its I/O because
>>>>>> obviously
>>>>>>>>>>>>>>> fencing was not successful .....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So with a correct DRBD configuration one of your nodes should
>>>>>>>> already
>>>>>>>>>>>>>>> have been fenced because of connection loss between nodes (on
>>>>>> drbd
>>>>>>>>>>>>>>> replication link).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can use e.g. that nice fencing script:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://goo.gl/O4N8f
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is the output of "drbdadm dump admin": <
>>>>>>>>>> http://pastebin.com/kTxvHCtx>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I've got resource-and-stonith. I gather from an earlier
>>>> thread
>>>>>>>> that
>>>>>>>>>>>>>> obliterate-peer.sh is more-or-less equivalent in functionality
>>>>>> with
>>>>>>>>>>>>>> stonith_admin_fence_peer.sh:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <
>>>> http://www.gossamer-threads.com/lists/linuxha/users/78504#78504>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the moment I'm pursuing the possibility that I'm returning
>>>> the
>>>>>>>>>> wrong return
>>>>>>>>>>>>>> codes from my fencing agent:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <http://www.gossamer-threads.com/lists/linuxha/users/78572>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I cleaned up my fencing agent, making sure its return code
>>>> matched
>>>>>>>>>> those
>>>>>>>>>>>>> returned by other agents in /usr/sbin/fence_, and allowing for
>>>> some
>>>>>>>>>> delay issues
>>>>>>>>>>>>> in reading the UPS status. But...
>>>>>>>>>>>>>
>>>>>>>>>>>>>> After that, I'll look at another suggestion with lvm.conf:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <
>>>> http://www.gossamer-threads.com/lists/linuxha/users/78796#78796>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Then I'll try DRBD 8.4.1. Hopefully one of these is the source
>>>> of
>>>>>>>> the
>>>>>>>>>> issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Failure on all three counts.
>>>>>>>>>>>>
>>>>>>>>>>>> May I suggest you double check the permissions on your fence
>> peer
>>>>>>>>>> script?
>>>>>>>>>>>> I suspect you may simply have forgotten the "chmod +x" .
>>>>>>>>>>>>
>>>>>>>>>>>> Test with "drbdadm fence-peer minor-0" from the command line.
>>>>>>>>>>>
>>>>>>>>>>> I still haven't solved the problem, but this advice has gotten me
>>>>>>>>>> further than
>>>>>>>>>>> before.
>>>>>>>>>>>
>>>>>>>>>>> First, Lars was correct: I did not have execute permissions set
>> on
>>>> my
>>>>>>>>>> fence peer
>>>>>>>>>>> scripts. (D'oh!) I turned them on, but that did not change
>>>> anything:
>>>>>>>>>> cman+clvmd
>>>>>>>>>>> still hung on the vgdisplay command if I crashed the peer node.
>>>>>>>>>>>
>>>>>>>>>>> I started up both nodes again (cman+pacemaker+drbd+clvmd) and
>> tried
>>>>>>>> Lars'
>>>>>>>>>>> suggested command. I didn't save the response for this message
>>>> (d'oh
>>>>>>>>>> again!) but
>>>>>>>>>>> it said that the fence-peer script had failed.
>>>>>>>>>>>
>>>>>>>>>>> Hmm. The peer was definitely shutting down, so my fencing script
>> is
>>>>>>>>>> working. I
>>>>>>>>>>> went over it, comparing the return codes to those of the existing
>>>>>>>>>> scripts, and
>>>>>>>>>>> made some changes. Here's my current script: <
>>>>>>>>>> http://pastebin.com/nUnYVcBK>.
>>>>>>>>>>>
>>>>>>>>>>> Up until now my fence-peer scripts had either been Lon
>> Hohberger's
>>>>>>>>>>> obliterate-peer.sh or Digimer's rhcs_fence. I decided to try
>>>>>>>>>>> stonith_admin-fence-peer.sh that Andreas Kurz recommended; unlike
>>>> the
>>>>>>>>>> first two
>>>>>>>>>>> scripts, which fence using fence_node, the latter script just
>> calls
>>>>>>>>>> stonith_admin.
>>>>>>>>>>>
>>>>>>>>>>> When I tried the stonith_admin-fence-peer.sh script, it worked:
>>>>>>>>>>>
>>>>>>>>>>> # drbdadm fence-peer minor-0
>>>>>>>>>>> stonith_admin-fence-peer.sh[10886]: stonith_admin successfully
>>>> fenced
>>>>>>>>>> peer
>>>>>>>>>>> orestes-corosync.nevis.columbia.edu.
>>>>>>>>>>>
>>>>>>>>>>> Power was cut on the peer, the remaining node stayed up. Then I
>>>>>> brought
>>>>>>>>>> up the
>>>>>>>>>>> peer with:
>>>>>>>>>>>
>>>>>>>>>>> stonith_admin -U orestes-corosync.nevis.columbia.edu
>>>>>>>>>>>
>>>>>>>>>>> BUT: When the restored peer came up and started to run cman, the
>>>>>> clvmd
>>>>>>>>>> hung on
>>>>>>>>>>> the main node again.
>>>>>>>>>>>
>>>>>>>>>>> After cycling through some more tests, I found that if I brought
>>>> down
>>>>>>>>>> the peer
>>>>>>>>>>> with drbdadm, then brought up with the peer with no HA services,
>>>> then
>>>>>>>>>> started
>>>>>>>>>>> drbd and then cman, the cluster remained intact.
>>>>>>>>>>>
>>>>>>>>>>> If I crashed the peer, the scheme in the previous paragraph
>> didn't
>>>>>>>> work.
>>>>>>>>>> I bring
>>>>>>>>>>> up drbd, check that the disks are both UpToDate, then bring up
>>>> cman.
>>>>>> At
>>>>>>>>>> that
>>>>>>>>>>> point the vgdisplay on the main node takes so long to run that
>>>> clvmd
>>>>>>>>>> will time out:
>>>>>>>>>>>
>>>>>>>>>>> vgdisplay  Error locking on node
>>>> orestes-corosync.nevis.columbia.edu
>>>>>> :
>>>>>>>>>> Command
>>>>>>>>>>> timed out
>>>>>>>>>>>
>>>>>>>>>>> I timed how long it took vgdisplay to run. I might be able to
>> work
>>>>>>>>>> around this
>>>>>>>>>>> by setting the timeout on my clvmd resource to 300s, but that
>> seems
>>>>>> to
>>>>>>>>>> be a
>>>>>>>>>>> band-aid for an underlying problem. Any suggestions on what else
>> I
>>>> could
>>>>>>>>>> check?
>>>>>>>>>>
>>>>>>>>>> I've done some more tests. Still no solution, just an observation:
>>>>>>>>>> The "death mode" appears to be:
>>>>>>>>>>
>>>>>>>>>> - Two nodes running cman+pacemaker+drbd+clvmd
>>>>>>>>>> - Take one node down = one remaining node
>>>> w/cman+pacemaker+drbd+clvmd
>>>>>>>>>> - Start up dead node. If it ever gets into a state in which
>>>>>>>>>> it's running cman but not clvmd, clvmd on the uncrashed node
>> hangs.
>>>>>>>>>> - Conversely, if I bring up drbd, make it primary, start
>>>>>>>>>> cman+clvmd, there's no problem on the uncrashed node.
>>>>>>>>>>
>>>>>>>>>> My guess is that clvmd is getting the number of nodes it expects
>>>>>>>>>> from cman. When the formally-dead node starts running cman, the
>>>>>>>>>> number of cluster nodes goes to 2 (I checked with 'cman_tool
>>>>>>>>>> status') but the number of nodes running clvmd is still 1, hence
>>>>>>>>>> the crash.
>>>>>>>>>>
>>>>>>>>>> Does this guess make sense?


-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://[email protected]
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

Reply via email to