Re: [Linux-HA] How do I clear the Failed actions section?

William Seligman Thu, 08 Mar 2012 04:33:48 -0800

On 3/8/12 6:53 AM, Helmut Wollmersdorfer wrote:


Am 07.03.2012 um 18:01 schrieb Florian Haas:

On Wed, Mar 7, 2012 at 5:51 PM, William Seligman
<[email protected]>  wrote:

Again, a disclaimer: I am not an expert.


Your advice was spot on. :)


But what to do, if cleanup is not working? And everything is running:

# crm status
============
Last updated: Thu Mar  8 12:27:00 2012
Stack: Heartbeat
Current DC: xen10 (5ab5ba3d-3be5-4763-83e7-90aaa49361a6) - partition
with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, unknown expected votes
12 Resources configured.
============

Online: [ xen10 xen11 ]

   xen_www      (ocf::heartbeat:Xen):   Started xen11
   Master/Slave Set: DrbdClone1
       Masters: [ xen11 ]
       Slaves: [ xen10 ]
   xen_typo3    (ocf::heartbeat:Xen):   Started xen11
   xen_shopdb   (ocf::heartbeat:Xen):   Started xen10
   xen_admintool        (ocf::heartbeat:Xen):   Started xen11
   xen_cmsdb    (ocf::heartbeat:Xen):   Started xen11
   Master/Slave Set: DrbdClone2
       Resource Group: group_drbd2:0
           xen_drbd2_1:0        (ocf::linbit:drbd):     Slave xen10 (unmanaged)
FAILED
           xen_drbd2_2:0        (ocf::linbit:drbd):     Stopped
       Masters: [ xen11 ]
   Master/Slave Set: DrbdClone3
       Masters: [ xen10 ]
       Slaves: [ xen11 ]
   Master/Slave Set: DrbdClone5
       Masters: [ xen11 ]
       Slaves: [ xen10 ]
   Master/Slave Set: DrbdClone6
       Slaves: [ xen11 xen10 ]
   Master/Slave Set: DrbdClone4
       Masters: [ xen11 ]
       Slaves: [ xen10 ]

Failed actions:
      xen_cmsdb_monitor_3000 (node=xen10, call=571, rc=7,
status=complete): not running
      xen_drbd1_2:1_promote_0 (node=xen10, call=5205, rc=1,
status=complete): unknown error
      xen_drbd2_1:1_promote_0 (node=xen10, call=790, rc=1,
status=complete): unknown error
      xen_ns2_monitor_3000 (node=xen10, call=601, rc=7,
status=complete): not running
      xen_drbd3_1:1_promote_0 (node=xen10, call=383, rc=-2,
status=Timed Out): unknown exec error
      xen_drbd2_1:0_promote_0 (node=xen10, call=1326, rc=-2,
status=Timed Out): unknown exec error
      xen_drbd2_1:0_stop_0 (node=xen10, call=1348, rc=-2, status=Timed
Out): unknown exec error

xen11:# crm resource cleanup xen_drbd2_1
Error performing operation: The object/attribute does not exist
Error performing operation: The object/attribute does not exist


Given the list of resources displayed by crm_mon, the command you need is

crm resource cleanup DrbdClone2

I can't say whether that will fix your problems, but you won't get the "does not exist" message.

Somewhere in either "Pacemaker Explained" or "Clusters From Scratch", it says that once you clone or ms a resource, you can't refer to that resource as an individual anymore; you have to use the clone/ms name.

What I did when faced with a problem like yours is "cat /proc/drbd", look at the lines for the failed drbd, and fix it on my own. Then I'd type the cleanup command for pacemaker to pick up the current state of the resource.

# xm list
Name                                        ID   Mem VCPUs
State   Time(s)
Domain-0                                     0  1005    16     r-----
40648.5
admintool                                    5  4096     2     -
b----   7455.4
cmsdb                                        3  2048     2     -
b----   2106.5
typo3                                        2  1024     2     -
b----   2890.9
www                                          1  1024     1     -
b----    855.0


xen11:# drbdadm status
<drbd-status version="8.3.7" api="88">
<resources config_file="/etc/drbd.conf">
<resource minor="1" name="drbd1_1" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="2" name="drbd1_2" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="3" name="drbd2_1" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="4" name="drbd2_2" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="5" name="drbd3_1" cs="Connected" ro1="Secondary"
ro2="Primary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="6" name="drbd3_2" cs="Connected" ro1="Secondary"
ro2="Primary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="7" name="drbd4_1" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="8" name="drbd4_2" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="9" name="drbd5_1" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="10" name="drbd5_2" cs="Connected" ro1="Primary"
ro2="Secondary" ds1="UpToDate" ds2="UpToDate" />
<resource minor="11" name="drbd6_1" cs="StandAlone" ro1="Secondary"
ro2="Unknown" ds1="Outdated" ds2="DUnknown" />
<resource minor="12" name="drbd6_2" cs="StandAlone" ro1="Secondary"
ro2="Unknown" ds1="Outdated" ds2="DUnknown" />
<!-- resource minor="13" name="drbd7_1" not available or not yet
created -->
<!-- resource minor="14" name="drbd7_2" not available or not yet
created -->
<!-- resource minor="15" name="drbd8_1" not available or not yet
created -->
<!-- resource minor="16" name="drbd8_2" not available or not yet
created -->
</resources>
</drbd-status>

Helmut Wollmersdorfer


--
Bill Seligman             | mailto://[email protected]
Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/
PO Box 137                |
Irvington NY 10533  USA   | Phone: (914) 591-2823

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How do I clear the Failed actions section?

Reply via email to