Hi guys,
I'm in trouble with my 2 servers cluster (pacemaker+corosync) running
some services over 3 instances cloned by DRBD.
The problem is: when I unplug the ethernet cable the Master/Slave role
doesn't change so the services cannot start on the server that is well
connected to the network.
While if I simulate a connectivity degraded (using IP tables) the switch
works well.
I attach below my running config and I ask a couple of questions:
- why the attribute value of MS resources is "10000"? Is it a default
value?
- How can I fix my problem?
I would like that when I unplug the cable all the MS resources become
MASTER on the well-connected NODE.
Thank you for your help
---------------------------------------------------------------------------
Configuration
---------------------------------------------------------------------------
node alfa
node beta
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.3.10" cidr_netmask="24" nic="eth0" iflabel="0" \
op monitor interval="2s"
primitive WebSite ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor start-delay="15s" interval="60s" \
op start interval="0" timeout="40s" \
op stop interval="0" timeout="60s"
primitive drbd_freeswitch ocf:linbit:drbd \
params drbd_resource="r2" \
op monitor interval="30s" \
op start interval="15" timeout="240s" \
op stop interval="0" timeout="100s"
primitive drbd_logAlfa ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="30s" \
op start interval="15" timeout="240s" \
op stop interval="0" timeout="100s"
primitive drbd_logBeta ocf:linbit:drbd \
params drbd_resource="r1" \
op monitor interval="30s" \
op start interval="15" timeout="240s" \
op stop interval="0" timeout="100s"
primitive freeswitch lsb:freeswitch \
op monitor interval="60s" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s"
primitive fs_drbd_freeswitch ocf:heartbeat:Filesystem \
params device="/dev/drbd2" directory="/data" fstype="ext3" \
op monitor interval="20s" timeout="40s" \
op start interval="15" timeout="60s" \
op stop interval="0" timeout="60s"
primitive fs_drbd_logAlfa ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/log_alfa" fstype="ext3" \
op monitor interval="20s" timeout="40s" \
op start interval="15" timeout="60s" \
op stop interval="0" timeout="60s"
primitive fs_drbd_logBeta ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/log_beta" fstype="ext3" \
op monitor interval="20s" timeout="40s" \
op start interval="15" timeout="60s" \
op stop interval="0" timeout="60s"
primitive pingd ocf:pacemaker:ping \
params host_list="alfa beta 192.168.3.100 192.168.3.1"
multiplier="1000" attempts="2" \
op monitor interval="3s" timeout="60s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="20s"
primitive resMON ocf:pacemaker:ClusterMon \
operations $id="resMON-operations" \
op monitor interval="180" timeout="20" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="100s" \
params htmlfile="/data/srv/www/cluster-info/index.html"
extra_options="--snmp-trap 192.168.25.49"
group gr_freeswitch fs_drbd_freeswitch ClusterIP freeswitch resMON
WebSite \
meta resource-stickiness="50"
ms ms_drbd_freeswitch drbd_freeswitch \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" globally-unique="false"
ms ms_drbd_logAlfa drbd_logAlfa \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" globally-unique="false"
ms ms_drbd_logBeta drbd_logBeta \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" globally-unique="false"
clone pingdClone pingd \
meta globally-unique="false"
location lo_gr_freeswitch gr_freeswitch \
rule $id="lo_gr_freeswitch-rule" 100: #uname eq alfa \
rule $id="lo_gr_freeswitch-rule-0" -25000: not_defined pingd or pingd
lte 1000 \
rule $id="lo_gr_freeswitch-rule-1" pingd: defined pingd
location ms_logAlfa__on__alfa ms_drbd_logAlfa \
rule $id="ms_logAlfa__on__alfa-rule" $role="master" 2000: #uname eq
alfa
location ms_logBeta__on__beta ms_drbd_logBeta \
rule $id="ms_logBeta__on__beta-rule" $role="master" 2000: #uname eq
beta
colocation freeswitch_on_drbd inf: gr_freeswitch
ms_drbd_freeswitch:Master
colocation fs_logAlfa__on__drbd_logAlfa inf: fs_drbd_logAlfa
ms_drbd_logAlfa:Master
colocation fs_logBeta__on__drbd_logBeta inf: fs_drbd_logBeta
ms_drbd_logBeta:Master
order freeswitch_after_drbd inf: ms_drbd_freeswitch:promote
gr_freeswitch:start
order fs_logAlfa__after__drbd_logAlfa inf: ms_drbd_logAlfa:promote
fs_drbd_logAlfa:start
order fs_logBeta__after__drbd_logBeta inf: ms_drbd_logBeta:promote
fs_drbd_logBeta:start
property $id="cib-bootstrap-options" \
stonith-enabled="false" \
default-resource-stickiness="1" \
no-quorum-policy="ignore" \
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
cluster-infrastructure="openais" \
expected-quorum-votes="2"
rsc_defaults $id="rsc-options" \
resource-stickiness="1"
------------------------------------------------------------------------
Log when both nodes have cable connected
------------------------------------------------------------------------
crm_mon -A1
============
Last updated: Mon May 9 11:24:18 2011
Stack: openais
Current DC: alfa - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
7 Resources configured.
============
Online: [ alfa beta ]
Resource Group: gr_freeswitch
fs_drbd_freeswitch (ocf::heartbeat:Filesystem): Started alfa
ClusterIP (ocf::heartbeat:IPaddr2): Started alfa
freeswitch (lsb:freeswitch): Started alfa
resMON (ocf::pacemaker:ClusterMon): Started alfa
WebSite (ocf::heartbeat:apache): Started alfa
Master/Slave Set: ms_drbd_logAlfa [drbd_logAlfa]
Masters: [ alfa ]
Slaves: [ beta ]
Master/Slave Set: ms_drbd_logBeta [drbd_logBeta]
Masters: [ beta ]
Slaves: [ alfa ]
fs_drbd_logAlfa (ocf::heartbeat:Filesystem): Started alfa
fs_drbd_logBeta (ocf::heartbeat:Filesystem): Started beta
Master/Slave Set: ms_drbd_freeswitch [drbd_freeswitch]
Masters: [ alfa ]
Slaves: [ beta ]
Clone Set: pingdClone [pingd]
Started: [ alfa beta ]
Node Attributes:
* Node alfa:
+ master-drbd_freeswitch:0 : 10000
+ master-drbd_logAlfa:0 : 10000
+ master-drbd_logBeta:0 : 10000
+ pingd : 4000
* Node beta:
+ master-drbd_freeswitch:1 : 10000
+ master-drbd_logAlfa:1 : 10000
+ master-drbd_logBeta:1 : 10000
+ pingd : 4000
---------------------------------------------------------------
Log when only "beta" has the cable connected
---------------------------------------------------------------
Last updated: Mon May 9 11:27:09 2011
Stack: openais
Current DC: beta - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
7 Resources configured.
============
Online: [ beta ]
OFFLINE: [ alfa ]
Master/Slave Set: ms_drbd_logAlfa [drbd_logAlfa]
Slaves: [ beta ]
Stopped: [ drbd_logAlfa:1 ]
Master/Slave Set: ms_drbd_logBeta [drbd_logBeta]
Masters: [ beta ]
Stopped: [ drbd_logBeta:0 ]
fs_drbd_logBeta (ocf::heartbeat:Filesystem): Started beta
Master/Slave Set: ms_drbd_freeswitch [drbd_freeswitch]
Slaves: [ beta ]
Stopped: [ drbd_freeswitch:1 ]
Clone Set: pingdClone [pingd]
Started: [ beta ]
Stopped: [ pingd:0 ]
Node Attributes:
* Node beta:
+ master-drbd_freeswitch:0 : 10000
+ master-drbd_logAlfa:0 : 10000
+ master-drbd_logBeta:1 : 10000
+ pingd : 3000 : Connectivity is
degraded (Expected=4000)
Failed actions:
drbd_logAlfa:0_promote_0 (node=beta, call=1324, rc=-2, status=Timed
Out): unknown exec error
drbd_freeswitch:0_promote_0 (node=beta, call=1331, rc=-2,
status=Timed Out): unknown exec error
drbd_logAlfa:1_promote_0 (node=beta, call=1354, rc=-2, status=Timed
Out): unknown exec error
drbd_freeswitch:1_promote_0 (node=beta, call=1355, rc=-2,
status=Timed Out): unknown exec error
:: SYSNET TELEMATICA srl ::
CONFIDENZIALE:
Questo messaggio e gli eventuali allegati sono confidenziali e riservati.
Se vi è stato recapitato per errore e non siete fra i destinatari elencati,
siete pregati di darne immediatamente avviso al mittente e cancellare il
messaggio
di posta e gli eventuali file allegati. Le informazioni contenute non devono
essere mostrate ad altri, né utilizzate, memorizzate o copiate in qualsiasi
forma.
CONFIDENTIALITY :
This e-mail and any attachments are confidential and may be privileged.
If you are not a named recipient, please notify the sender immediately and
delete
this e-mail and any attachment. Do not disclose the contents to another person,
use it for any purpose or store or copy the information in any medium.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems