Hello all,
I've been trying to configure the vcenter stonith plugin discussed
here: http://comments.gmane.org/gmane.linux.highavailability.devel/6407
without much luck. I've taken all the preliminary steps, installing
the VMware vSphere CLI, generating a credentials file, etc. etc.
I'm running ubuntu 10.04.2, which is heartbeat 3.0.3-1ubuntu1 and
pacemaker 1.0.8+hg15494-2ubuntu2
I can even test it successfully using stonith(8):
#!/bin/bash
# file: test-vcenter-stonith.sh
hosts="\
mb1-haproxy01.fqdn=mb1-haproxy01;\
mb1-haproxy02.fqdn=mb1-haproxy02;\
mb1-haproxy03.fqdn=mb1-haproxy03"
stonith -t external/vcenter \
VI_SERVER=vcenter1 \
VI_CREDSTORE=/etc/vicredentials.xml \
HOSTLIST=$hosts \
RESETPOWERON='0' \
VI_PORTNUMBER=443 \
VI_PROTOCOL=https \
VI_SERVICEPATH=/sdk/webService \
-lS
# -- end file --
$ sudo ./test-vcenter-stonith.sh
stonith: external/vcenter device OK.
mb1-haproxy02.fqdn
mb1-haproxy03.fqdn
mb1-haproxy01.fqdn
But when i use the following CIB config
node $id="432e1c50-7ad4-4b92-b048-477a237d778d" mb1-haproxy03.fqdn
node $id="78ce6a3b-c507-4a4c-8076-bcab152d1b7c" mb1-haproxy02.fqdn
node $id="f634853f-486d-4357-93f1-76823c12dd56" mb1-haproxy01.fqdn
primitive mb1_haproxy_vip02 ocf:heartbeat:IPaddr \
params ip="10.51.15.79" cidr_netmask="255.255.252.0" nic="eth0" \
op monitor interval="40s" timeout="20s"
primitive mb1_haproxy_vip03 ocf:heartbeat:IPaddr \
params ip="10.51.15.80" cidr_netmask="255.255.252.0" nic="eth0" \
op monitor interval="40s" timeout="20s"
primitive vcenter_fencing stonith:external/vcenter \
params VI_SERVER="vcenter1" \
VI_PORTNUMBER="443" \
VI_PROTOCOL="https" \
VI_SERVICEPATH="/sdk/webService" \
VI_CREDSTORE="/etc/vicredentials.xml" \
HOSTLIST="mb1-haproxy01.fqdn=mb1-haproxy01;mb1-haproxy02.fqdn=mb1-haproxy02;mb1-haproxy03.fqdn=mb1-haproxy03"
\
RESETPOWERON="0" \
op monitor interval="60s"
location mb1_haproxy_vip02_failover mb1_haproxy_vip02 50: mb1-haproxy01.fqdn
location mb1_haproxy_vip02_pref mb1_haproxy_vip02 100: mb1-haproxy02.fqdn
location mb1_haproxy_vip03_failover mb1_haproxy_vip03 50: mb1-haproxy01.fqdn
location mb1_haproxy_vip03_pref mb1_haproxy_vip03 100: mb1-haproxy03.fqdn
property $id="cib-bootstrap-options" \
dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
stonith-timeout="60s"
-------------------------------------------------------
I get the following in crm_mon -1
============
Last updated: Fri Jul 15 19:38:04 2011
Stack: Heartbeat
Current DC: mb1-haproxy02.fqdn (78ce6a3b-c507-4a4c-8076-bcab152d1b7c)
- partition with quorum
Version: 1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd
3 Nodes configured, unknown expected votes
3 Resources configured.
============
Online: [ mb1-haproxy02.fqdn mb1-haproxy03.fqdn ]
OFFLINE: [ mb1-haproxy01.fqdn ]
mb1_haproxy_vip02 (ocf::heartbeat:IPaddr): Started
mb1-haproxy02.fqdn
mb1_haproxy_vip03 (ocf::heartbeat:IPaddr): Started
mb1-haproxy03.fqdn
Failed actions:
vcenter_fencing_start_0 (node=mb1-haproxy02.fqdn, call=10, rc=1,
status=complete): unknown error
vcenter_fencing_start_0 (node=mb1-haproxy03.fqdn, call=8, rc=1,
status=complete): unknown error
------------------------------------------------------
and i see this in the logs:
info: determine_online_status: Node mb1-haproxy02.fqdn is online
WARN: unpack_rsc_op: Processing failed op vcenter_fencing_start_0 on
mb1-haproxy02.fqdn: unknown error (1)
info: determine_online_status: Node mb1-haproxy03.fqdn is online
WARN: unpack_rsc_op: Processing failed op vcenter_fencing_start_0 on
mb1-haproxy03.fqdn: unknown error (1)
notice: native_print:
mb1_haproxy_vip02#011(ocf::heartbeat:IPaddr):#011Started
mb1-haproxy02.fqdn
notice: native_print:
mb1_haproxy_vip03#011(ocf::heartbeat:IPaddr):#011Started
mb1-haproxy03.fqdn
notice: native_print: vcenter_fencing#011(stonith:external/vcenter):#011Stopped
info: get_failcount: vcenter_fencing has failed 1000000 times on
mb1-haproxy02.fqdn
WARN: common_apply_stickiness: Forcing vcenter_fencing away from
mb1-haproxy02.fqdn after 1000000 failures (max=1000000)
info: get_failcount: vcenter_fencing has failed 1000000 times on
mb1-haproxy03.fqdn
WARN: common_apply_stickiness: Forcing vcenter_fencing away from
mb1-haproxy03.fqdn after 1000000 failures (max=1000000)
info: native_color: Resource vcenter_fencing cannot run anywhere
I'm confused as to why it would work using stonith on the
command-line, but fail when run by pacemaker. I'm afraid i don't know
how to debug the issue either, as there doesn't seem to be any error
output that tells me *what* is misconfigured.
I'd appreciate any help that could be offered, I've been banging my
head against different STONITH options for about the past week or so,
and this one looks like the best fit for our environment.
Thanks,
Jonathan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems