gfs2 hangs if a node crashes

William Seligman Fri, 16 Mar 2012 14:04:00 -0700

On 3/16/12 4:53 AM, emmanuel segura wrote:

> for the lvm hang you can use this in your /etc/lvm/lvm.conf
> 
> ignore_suspended_devices = 1
> 
> because i seen in the lvm log,
> 
> ===============================================
> and then it hangs. Comparing the two, it looks like it can't close
> /dev/drbd0
> ===============================================


No, this does not prevent the hang. I tried with both DRBD 8.3.12 and 8.4.1.

> Il giorno 15 marzo 2012 23:50, William Seligman <[email protected]
>> ha scritto:
> 
>> On 3/15/12 6:07 PM, William Seligman wrote:
>>> On 3/15/12 6:05 PM, William Seligman wrote:
>>>> On 3/15/12 4:57 PM, emmanuel segura wrote:
>>>>
>>>>> we can try to understand what happen when clvm hang
>>>>>
>>>>> edit the /etc/lvm/lvm.conf  and change level = 7 in the log session and
>>>>> uncomment this line
>>>>>
>>>>> file = "/var/log/lvm2.log"
>>>>
>>>> Here's the tail end of the file (the original is 1.6M). Because there
>> no times
>>>> in the log, it's hard for me to point you to the point where I crashed
>> the other
>>>> system. I think (though I'm not sure) that the crash happened after the
>> last
>>>> occurrence of
>>>>
>>>> cache/lvmcache.c:1484   Wiping internal VG cache
>>>>
>>>> Honestly, it looks like a wall of text to me. Does it suggest anything
>> to you?
>>>
>>> Maybe it would help if I included the link to the pastebin where I put
>> the
>>> output: <http://pastebin.com/8pgW3Muw>
>>
>> Could the problem be with lvm+drbd?
>>
>> In lvm2.conf, I see this sequence of lines pre-crash:
>>
>> device/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:271   /dev/md0: size is 1027968 sectors
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> device/dev-io.c:588   Closed /dev/md0
>> device/dev-io.c:271   /dev/md0: size is 1027968 sectors
>> device/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> device/dev-io.c:588   Closed /dev/md0
>> filters/filter-composite.c:31   Using /dev/md0
>> device/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> label/label.c:186   /dev/md0: No label detected
>> device/dev-io.c:588   Closed /dev/md0
>> device/dev-io.c:535   Opened /dev/drbd0 RO O_DIRECT
>> device/dev-io.c:271   /dev/drbd0: size is 5611549368 sectors
>> device/dev-io.c:137   /dev/drbd0: block size is 4096 bytes
>> device/dev-io.c:588   Closed /dev/drbd0
>> device/dev-io.c:271   /dev/drbd0: size is 5611549368 sectors
>> device/dev-io.c:535   Opened /dev/drbd0 RO O_DIRECT
>> device/dev-io.c:137   /dev/drbd0: block size is 4096 bytes
>> device/dev-io.c:588   Closed /dev/drbd0
>>
>> I interpret this: Look at /dev/md0, get some info, close; look at
>> /dev/drbd0,
>> get some info, close.
>>
>> Post-crash, I see:
>>
>> evice/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:271   /dev/md0: size is 1027968 sectors
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> device/dev-io.c:588   Closed /dev/md0
>> device/dev-io.c:271   /dev/md0: size is 1027968 sectors
>> device/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> device/dev-io.c:588   Closed /dev/md0
>> filters/filter-composite.c:31   Using /dev/md0
>> device/dev-io.c:535   Opened /dev/md0 RO O_DIRECT
>> device/dev-io.c:137   /dev/md0: block size is 1024 bytes
>> label/label.c:186   /dev/md0: No label detected
>> device/dev-io.c:588   Closed /dev/md0
>> device/dev-io.c:535   Opened /dev/drbd0 RO O_DIRECT
>> device/dev-io.c:271   /dev/drbd0: size is 5611549368 sectors
>> device/dev-io.c:137   /dev/drbd0: block size is 4096 bytes
>>
>> ... and then it hangs. Comparing the two, it looks like it can't close
>> /dev/drbd0.
>>
>> If I look at /proc/drbd when I crash one node, I see this:
>>
>> # cat /proc/drbd
>> version: 8.3.12 (api:88/proto:86-96)
>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by
>> [email protected], 2012-02-28 18:01:34
>>  0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s-----
>>    ns:7000064 nr:0 dw:0 dr:7049728 al:0 bm:516 lo:0 pe:0 ua:0 ap:0 ep:1
>> wo:b oos:0
>>
>>
>> If I look at /proc/drbd if I bring down one node gracefully (crm node
>> standby),
>> I get this:
>>
>> # cat /proc/drbd
>> version: 8.3.12 (api:88/proto:86-96)
>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by
>> [email protected], 2012-02-28 18:01:34
>>  0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated C r-----
>>    ns:7000064 nr:40 dw:40 dr:7036496 al:0 bm:516 lo:0 pe:0 ua:0 ap:0 ep:1
>> wo:b
>> oos:0
>>
>> Could it be that drbd can't respond to certain requests from lvm if the
>> state of
>> the peer is DUnknown instead of Outdated?
>>
>>>>> Il giorno 15 marzo 2012 20:50, William Seligman <
>> [email protected]
>>>>>> ha scritto:
>>>>>
>>>>>> On 3/15/12 12:55 PM, emmanuel segura wrote:
>>>>>>
>>>>>>> I don't see any error and the answer for your question it's yes
>>>>>>>
>>>>>>> can you show me your /etc/cluster/cluster.conf and your crm configure
>>>>>> show
>>>>>>>
>>>>>>> like that more later i can try to look if i found some fix
>>>>>>
>>>>>> Thanks for taking a look.
>>>>>>
>>>>>> My cluster.conf: <http://pastebin.com/w5XNYyAX>
>>>>>> crm configure show: <http://pastebin.com/atVkXjkn>
>>>>>>
>>>>>> Before you spend a lot of time on the second file, remember that clvmd
>>>>>> will hang
>>>>>> whether or not I'm running pacemaker.
>>>>>>
>>>>>>> Il giorno 15 marzo 2012 17:42, William Seligman <
>>>>>> [email protected]
>>>>>>>> ha scritto:
>>>>>>>
>>>>>>>> On 3/15/12 12:15 PM, emmanuel segura wrote:
>>>>>>>>
>>>>>>>>> Ho did you created your volume group
>>>>>>>>
>>>>>>>> pvcreate /dev/drbd0
>>>>>>>> vgcreate -c y ADMIN /dev/drbd0
>>>>>>>> lvcreate -L 200G -n usr ADMIN # ... and so on
>>>>>>>> # "Nevis-HA" is the cluster name I used in cluster.conf
>>>>>>>> mkfs.gfs2 -p lock_dlm -j 2 -t Nevis_HA:usr /dev/ADMIN/usr  # ...
>> and so
>>>>>> on
>>>>>>>>
>>>>>>>>> give me the output of vgs command when the cluster it's up
>>>>>>>>
>>>>>>>> Here it is:
>>>>>>>>
>>>>>>>>    Logging initialised at Thu Mar 15 12:40:39 2012
>>>>>>>>    Set umask from 0022 to 0077
>>>>>>>>    Finding all volume groups
>>>>>>>>    Finding volume group "ROOT"
>>>>>>>>    Finding volume group "ADMIN"
>>>>>>>>  VG    #PV #LV #SN Attr   VSize   VFree
>>>>>>>>  ADMIN   1   5   0 wz--nc   2.61t 765.79g
>>>>>>>>  ROOT    1   2   0 wz--n- 117.16g      0
>>>>>>>>    Wiping internal VG cache
>>>>>>>>
>>>>>>>> I assume the "c" in the ADMIN attributes means that clustering is
>> turned
>>>>>>>> on?
>>>>>>>>
>>>>>>>>> Il giorno 15 marzo 2012 17:06, William Seligman <
>>>>>>>> [email protected]
>>>>>>>>>> ha scritto:
>>>>>>>>>
>>>>>>>>>> On 3/15/12 11:50 AM, emmanuel segura wrote:
>>>>>>>>>>> yes william
>>>>>>>>>>>
>>>>>>>>>>> Now try clvmd -d and see what happen
>>>>>>>>>>>
>>>>>>>>>>> locking_type = 3 it's lvm cluster lock type
>>>>>>>>>>
>>>>>>>>>> Since you asked for confirmation, here it is: the output of
>> 'clvmd -d'
>>>>>>>>>> just now. <http://pastebin.com/bne8piEw>. I crashed the other
>> node at
>>>>>>>>>> Mar 15 12:02:35, when you see the only additional line of output.
>>>>>>>>>>
>>>>>>>>>> I don't see any particular difference between this and the
>> previous
>>>>>>>>>> result <http://pastebin.com/sWjaxAEF>, which suggests that I had
>>>>>>>>>> cluster locking enabled before, and still do now.
>>>>>>>>>>
>>>>>>>>>>> Il giorno 15 marzo 2012 16:15, William Seligman <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> ha scritto:
>>>>>>>>>>>
>>>>>>>>>>>> On 3/15/12 5:18 AM, emmanuel segura wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The first thing i seen in your clvmd log it's this
>>>>>>>>>>>>>
>>>>>>>>>>>>> =============================================
>>>>>>>>>>>>>  WARNING: Locking disabled. Be careful! This could corrupt your
>>>>>> metadata.
>>>>>>>>>>>>> =============================================
>>>>>>>>>>>>
>>>>>>>>>>>> I saw that too, and thought the same as you did. I did some
>> checks
>>>>>>>>>>>> (see below), but some web searches suggest that this message is
>> a
>>>>>>>>>>>> normal consequence of clvmd initialization; e.g.,
>>>>>>>>>>>>
>>>>>>>>>>>> <http://markmail.org/message/vmy53pcv52wu7ghx>
>>>>>>>>>>>>
>>>>>>>>>>>>> use this command
>>>>>>>>>>>>>
>>>>>>>>>>>>> lvmconf --enable-cluster
>>>>>>>>>>>>>
>>>>>>>>>>>>> and remember for cman+pacemaker you don't need qdisk
>>>>>>>>>>>>
>>>>>>>>>>>> Before I tried your lvmconf suggestion, here was my
>>>>>> /etc/lvm/lvm.conf:
>>>>>>>>>>>> <http://pastebin.com/841VZRzW> and the output of "lvm
>> dumpconfig":
>>>>>>>>>>>> <http://pastebin.com/rtw8c3Pf>.
>>>>>>>>>>>>
>>>>>>>>>>>> Then I did as you suggested, but with a check to see if anything
>>>>>>>>>>>> changed:
>>>>>>>>>>>>
>>>>>>>>>>>> # cd /etc/lvm/
>>>>>>>>>>>> # cp lvm.conf lvm.conf.cluster
>>>>>>>>>>>> # lvmconf --enable-cluster
>>>>>>>>>>>> # diff lvm.conf lvm.conf.cluster
>>>>>>>>>>>> #
>>>>>>>>>>>>
>>>>>>>>>>>> So the key lines have been there all along:
>>>>>>>>>>>>    locking_type = 3
>>>>>>>>>>>>    fallback_to_local_locking = 0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Il giorno 14 marzo 2012 23:17, William Seligman <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> ha scritto:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 3/14/12 9:20 AM, emmanuel segura wrote:
>>>>>>>>>>>>>>> Hello William
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i did new you are using drbd and i dont't know what type of
>>>>>>>>>>>>>>> configuration you using
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But it's better you try to start clvm with clvmd -d
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> like thak we can see what it's the problem
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For what it's worth, here's the output of running clvmd -d on
>>>>>>>>>>>>>> the node that stays up: <http://pastebin.com/sWjaxAEF>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What's probably important in that big mass of output are the
>>>>>>>>>>>>>> last two lines. Up to that point, I have both nodes up and
>>>>>>>>>>>>>> running cman + clvmd; cluster.conf is here:
>>>>>>>>>>>>>> <http://pastebin.com/w5XNYyAX>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the time of the next-to-the-last line, I cut power to the
>>>>>>>>>>>>>> other node.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the time of the last line, I run "vgdisplay" on the
>>>>>>>>>>>>>> remaining node, which hangs forever.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After a lot of web searching, I found that I'm not the only
>> one
>>>>>>>>>>>>>> with this problem. Here's one case that doesn't seem relevant
>>>>>>>>>>>>>> to me, since I don't use qdisk:
>>>>>>>>>>>>>> <
>>>>>>
>> http://www.redhat.com/archives/linux-cluster/2007-October/msg00212.html>.
>>>>>>>>>>>>>> Here's one with the same problem with the same OS:
>>>>>>>>>>>>>> <http://bugs.centos.org/view.php?id=5229>, but with no
>>>>>> resolution.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Out of curiosity, has anyone on this list made a two-node
>>>>>>>>>>>>>> cman+clvmd cluster work for them?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il giorno 14 marzo 2012 14:02, William Seligman <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> ha scritto:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 3/14/12 6:02 AM, emmanuel segura wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  I think it's better you make clvmd start at boot
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> chkconfig cman on ; chkconfig clvmd on
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've already tried it. It doesn't work. The problem is that
>>>>>>>>>>>>>>>> my LVM information is on the drbd. If I start up clvmd
>>>>>>>>>>>>>>>> before drbd, it won't find the logical volumes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also don't see why that would make a difference (although
>>>>>>>>>>>>>>>> this could be part of the confusion): a service is a
>>>>>>>>>>>>>>>> service. I've tried starting up clvmd inside and outside
>>>>>>>>>>>>>>>> pacemaker control, with the same problem. Why would
>>>>>>>>>>>>>>>> starting clvmd at boot make a difference?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  Il giorno 13 marzo 2012 23:29, William Seligman<
>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ha scritto:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  On 3/13/12 5:50 PM, emmanuel segura wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  So if you using cman why you use lsb::clvmd
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think you are very confused
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't dispute that I may be very confused!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> However, from what I can tell, I still need to run
>>>>>>>>>>>>>>>>>> clvmd even if I'm running cman (I'm not using
>>>>>>>>>>>>>>>>>> rgmanager). If I just run cman, gfs2 and any other form
>>>>>>>>>>>>>>>>>> of mount fails. If I run cman, then clvmd, then gfs2,
>>>>>>>>>>>>>>>>>> everything behaves normally.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Going by these instructions:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <
>> https://alteeve.com/w/2-Node_**Red_Hat_KVM_Cluster_Tutorial>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the resources he puts under "cluster control"
>>>>>>>>>>>>>>>>>> (rgmanager) I have to put under pacemaker control.
>>>>>>>>>>>>>>>>>> Those include drbd, clvmd, and gfs2.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The difference between what I've got, and what's in
>>>>>>>>>>>>>>>>>> "Clusters From Scratch", is in CFS they assign one DRBD
>>>>>>>>>>>>>>>>>> volume to a single filesystem. I create an LVM physical
>>>>>>>>>>>>>>>>>> volume on my DRBD resource, as in the above tutorial,
>>>>>>>>>>>>>>>>>> and so I have to start clvmd or the logical volumes in
>>>>>>>>>>>>>>>>>> the DRBD partition won't be recognized.>> Is there some
>>>>>>>>>>>>>>>>>> way to get logical volumes recognized automatically by
>>>>>>>>>>>>>>>>>> cman without rgmanager that I've missed?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Il giorno 13 marzo 2012 22:42, William Seligman<
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ha scritto:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>  On 3/13/12 12:29 PM, William Seligman wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm not sure if this is a "Linux-HA" question;
>>>>>>>>>>>>>>>>>>>>> please direct me to the appropriate list if it's
>>>>>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'm setting up a two-node cman+pacemaker+gfs2
>>>>>>>>>>>>>>>>>>>>> cluster as described in "Clusters From Scratch."
>>>>>>>>>>>>>>>>>>>>> Fencing is through forcibly rebooting a node by
>>>>>>>>>>>>>>>>>>>>> cutting and restoring its power via UPS.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> My fencing/failover tests have revealed a
>>>>>>>>>>>>>>>>>>>>> problem. If I gracefully turn off one node ("crm
>>>>>>>>>>>>>>>>>>>>> node standby"; "service pacemaker stop";
>>>>>>>>>>>>>>>>>>>>> "shutdown -r now") all the resources transfer to
>>>>>>>>>>>>>>>>>>>>> the other node with no problems. If I cut power
>>>>>>>>>>>>>>>>>>>>> to one node (as would happen if it were fenced),
>>>>>>>>>>>>>>>>>>>>> the lsb::clvmd resource on the remaining node
>>>>>>>>>>>>>>>>>>>>> eventually fails. Since all the other resources
>>>>>>>>>>>>>>>>>>>>> depend on clvmd, all the resources on the
>>>>>>>>>>>>>>>>>>>>> remaining node stop and the cluster is left with
>>>>>>>>>>>>>>>>>>>>> nothing running.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've traced why the lsb::clvmd fails: The
>>>>>>>>>>>>>>>>>>>>> monitor/status command includes "vgdisplay",
>>>>>>>>>>>>>>>>>>>>> which hangs indefinitely. Therefore the monitor
>>>>>>>>>>>>>>>>>>>>> will always time-out.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So this isn't a problem with pacemaker, but with
>>>>>>>>>>>>>>>>>>>>> clvmd/dlm: If a node is cut off, the cluster
>>>>>>>>>>>>>>>>>>>>> isn't handling it properly. Has anyone on this
>>>>>>>>>>>>>>>>>>>>> list seen this before? Any ideas?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Details:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> versions:
>>>>>>>>>>>>>>>>>>>>> Redhat Linux 6.2 (kernel 2.6.32)
>>>>>>>>>>>>>>>>>>>>> cman-3.0.12.1
>>>>>>>>>>>>>>>>>>>>> corosync-1.4.1
>>>>>>>>>>>>>>>>>>>>> pacemaker-1.1.6
>>>>>>>>>>>>>>>>>>>>> lvm2-2.02.87
>>>>>>>>>>>>>>>>>>>>> lvm2-cluster-2.02.87
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This may be a Linux-HA question after all!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I ran a few more tests. Here's the output from a
>>>>>>>>>>>>>>>>>>>> typical test of
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> grep -E "(dlm|gfs2}clvmd|fenc|syslogd)**"
>>>>>>>>>>>>>>>>>>>> /var/log/messages
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> <http://pastebin.com/uqC6bc1b>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It looks like what's happening is that the fence
>>>>>>>>>>>>>>>>>>>> agent (one I wrote) is not returning the proper
>>>>>>>>>>>>>>>>>>>> error code when a node crashes. According to this
>>>>>>>>>>>>>>>>>>>> page, if a fencing agent fails GFS2 will freeze to
>>>>>>>>>>>>>>>>>>>> protect the data:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> <
>>>>>>
>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-gfs2hand-allnodes.html
>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> As a test, I tried to fence my test node via
>>>>>>>>>>>>>>>>>>>> standard means:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> stonith_admin -F \
>>>>>>>>>>>>>>>>>>>> orestes-corosync.nevis.columbia.edu
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> These were the log messages, which show that
>>>>>>>>>>>>>>>>>>>> stonith_admin did its job and CMAN was notified of
>>>>>>>>>>>>>>>>>>>> the fencing:<http://pastebin.com/jaH820Bv>.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Unfortunately, I still got the gfs2 freeze, so this
>>>>>>>>>>>>>>>>>>>> is not the complete story.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> First things first. I vaguely recall a web page
>>>>>>>>>>>>>>>>>>>> that went over the STONITH return codes, but I
>>>>>>>>>>>>>>>>>>>> can't locate it again. Is there any reference to
>>>>>>>>>>>>>>>>>>>> the return codes expected from a fencing agent,
>>>>>>>>>>>>>>>>>>>> perhaps as function of the state of the fencing
>>>>>>>>>>>>>>>>>>>> device?


-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://[email protected]
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

Reply via email to