Fellows

I have a cluster with two nodes, (db-sql1, db-sql3) and am trying to configure STONITH over IPMI between them. I got my IPMI working fine, and successfully tested to see if I can reboot the hosts using the same command that is used in the driver's implementation (/usr/bin/ipmitool -I lan -H ${ipaddr} -U ${userid} -P ${passwd} power reset).

I wrote the following XML configuration, from following the DTD shipped with my version of the heartbeat software (CentOS, 2.1.3-21.1, with kernel 2.6.18-53.1.14.el5):

<primitive id="db-sql1-shooter" class="stonith" type="external/ipmi" provider="heartbeat">
     <operations>
       <op id="op-sql1-shooter-stop" name="stop" timeout="60s"/>
       <op id="op-sql1-shooter-start" name="start" timeout="30s"/>
<op id="op-sql1-shooter-monitor" name="monitor" timeout="5s" interval="10s"/>
     </operations>
     <instance_attributes id="df585416-074e-4955-a431-862f529e5b0b">
       <attributes>
<nvpair name="hostname" value="db-sql1" id="ee671a7b-9322-487e-8a19-689c85b0df65"/> <nvpair name="ipaddr" value="db-sql1-ipmi" id="2931750a-4304-48df-ac80-eb65619f1d33"/> <nvpair name="userid" value="guess-who" id="bbd8a701-b5aa-4a93-8c2e-7a838d5d3558"/> <nvpair name="passwd" value="keep-trying" id="3c6ce903-6e7f-45e7-86cb-b6c63b92ddb0"/>
       </attributes>
     </instance_attributes>
   </primitive>
<primitive id="db-sql3-shooter" class="stonith" type="external/ipmi" provider="heartbeat">
     <operations>
       <op id="op-sql3-shooter-stop" name="stop" timeout="60s"/>
       <op id="op-sql3-shooter-start" name="start" timeout="30s"/>
<op id="op-sql3-shooter-monitor" name="monitor" timeout="5s" interval="10s"/>
     </operations>
     <instance_attributes id="2a0dba3a-f121-4695-9ed2-5c3407d2d38f">
       <attributes>
<nvpair name="hostname" value="db-sql3" id="8a57bbb6-06f0-4916-a2fe-1ef54cba637f"/> <nvpair name="ipaddr" value="db-sql3-ipmi" id="9b544cea-8da5-4945-a30c-84320536463f"/> <nvpair name="userid" value="guess-who" id="bf1cb9f4-5af5-4cd3-906f-239d87b04e66"/> <nvpair name="passwd" value="keep-trying" id="0e784680-871e-4c55-bec5-9f534620c032"/>
       </attributes>
     </instance_attributes>
   </primitive>
 </resources>

Also, under Lars' recommendation, added two constraints to my constrait set to prevent silly warnings during the start up of the stonith resources:

<rsc_location id="db-sql1-shooter-run-on-sql3" description="SQL1 shooter must be at SQL3" rsc="db-sql1-shooter" node="db-sql3" score="+INFINITY"/> <rsc_location id="db-sql3-shooter-run-on-sql1" description="SQL3 shooter must be at SQL1" rsc="db-sql3-shooter" node="db-sql1" score="+INFINITY"/>

As the final result, crm_mon sees my cluster like this:

============
Last updated: Thu Nov 20 11:53:52 2008
Current DC: db-sql3.ripe.net (bdee5d1b-405a-4630-9836-66e8758e81f1)
2 Nodes configured.
4 Resources configured.
============

Node: db-sql3.ripe.net (bdee5d1b-405a-4630-9836-66e8758e81f1): online
Node: db-sql1.ripe.net (46818264-663c-43dd-b5e4-7b7cd7f85022): online

Master/Slave Set: database-disk
database-storage-drbd:0 (heartbeat::ocf:drbd): Master db-sql3.ripe.net database-storage-drbd:1 (heartbeat::ocf:drbd): Started db-sql1.ripe.net
Resource Group: db-cluster-service
database-filesystem (heartbeat::ocf:Filesystem): Started db-sql3.ripe.net
    database-ip (heartbeat::ocf:IPaddr):        Started db-sql3.ripe.net
    database-server     (heartbeat::ocf:mysql): Started db-sql3.ripe.net
db-sql1-shooter (stonith:external/ipmi):        Started db-sql3.ripe.net
db-sql3-shooter (stonith:external/ipmi):        Started db-sql1.ripe.net

Failed actions:
db-sql1-shooter_start_0 (node=db-sql1.ripe.net, call=14, rc=1): complete

and crm_verify -VL points me to several warnings and a error that I am unable to interpret correctly:

crm_verify[29480]: 2008/11/20_11:55:03 ERROR: unpack_rsc_op: Remapping db-sql1-shooter_start_0 (rc=1) on db-sql1.ripe.net to an ERROR crm_verify[29480]: 2008/11/20_11:55:03 WARN: unpack_rsc_op: Processing failed op db-sql1-shooter_start_0 on db-sql1.ripe.net: Error crm_verify[29480]: 2008/11/20_11:55:03 WARN: unpack_rsc_op: Compatability handling for failed op db-sql1-shooter_start_0 on db-sql1.ripe.net

I am stuck, and need help. Don't know how to diagnose this, or which part of the source code to read to find out what's going on. Any suggestions, pointers, tips, tricks, or support will be highly appreciated.

Many thanks in advance.
Kind regards
--
Luis Motta Campos is a software engineer,
Perl Programmer, foodie and photographer.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to