On 05/09/2011 09:07 PM, luca bianchi wrote: > Hi guys, > I'm in trouble with my 2 servers cluster (pacemaker+corosync) running > some services over 3 instances cloned by DRBD. > > The problem is: when I unplug the ethernet cable the Master/Slave role > doesn't change so the services cannot start on the server that is well > connected to the network. > While if I simulate a connectivity degraded (using IP tables) the switch > works well. > > I attach below my running config and I ask a couple of questions: > > - why the attribute value of MS resources is "10000"? Is it a default > value? > - How can I fix my problem? > > I would like that when I unplug the cable all the MS resources become > MASTER on the well-connected NODE. > > Thank you for your help > > > --------------------------------------------------------------------------- > Configuration > --------------------------------------------------------------------------- > node alfa > node beta > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="192.168.3.10" cidr_netmask="24" nic="eth0" iflabel="0" \ > op monitor interval="2s" > primitive WebSite ocf:heartbeat:apache \ > params configfile="/etc/httpd/conf/httpd.conf" \ > op monitor start-delay="15s" interval="60s" \ > op start interval="0" timeout="40s" \ > op stop interval="0" timeout="60s" > primitive drbd_freeswitch ocf:linbit:drbd \ > params drbd_resource="r2" \ > op monitor interval="30s" \ > op start interval="15" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive drbd_logAlfa ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="30s" \ > op start interval="15" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive drbd_logBeta ocf:linbit:drbd \ > params drbd_resource="r1" \ > op monitor interval="30s" \ > op start interval="15" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive freeswitch lsb:freeswitch \ > op monitor interval="60s" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="100s" > primitive fs_drbd_freeswitch ocf:heartbeat:Filesystem \ > params device="/dev/drbd2" directory="/data" fstype="ext3" \ > op monitor interval="20s" timeout="40s" \ > op start interval="15" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive fs_drbd_logAlfa ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/log_alfa" fstype="ext3" \ > op monitor interval="20s" timeout="40s" \ > op start interval="15" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive fs_drbd_logBeta ocf:heartbeat:Filesystem \ > params device="/dev/drbd1" directory="/log_beta" fstype="ext3" \ > op monitor interval="20s" timeout="40s" \ > op start interval="15" timeout="60s" \ > op stop interval="0" timeout="60s" > primitive pingd ocf:pacemaker:ping \ > params host_list="alfa beta 192.168.3.100 192.168.3.1" > multiplier="1000" attempts="2" \ > op monitor interval="3s" timeout="60s" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="20s" > primitive resMON ocf:pacemaker:ClusterMon \ > operations $id="resMON-operations" \ > op monitor interval="180" timeout="20" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="100s" \ > params htmlfile="/data/srv/www/cluster-info/index.html" > extra_options="--snmp-trap 192.168.25.49" > group gr_freeswitch fs_drbd_freeswitch ClusterIP freeswitch resMON > WebSite \ > meta resource-stickiness="50" > ms ms_drbd_freeswitch drbd_freeswitch \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" globally-unique="false" > ms ms_drbd_logAlfa drbd_logAlfa \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" globally-unique="false" > ms ms_drbd_logBeta drbd_logBeta \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" globally-unique="false" > clone pingdClone pingd \ > meta globally-unique="false" > location lo_gr_freeswitch gr_freeswitch \ > rule $id="lo_gr_freeswitch-rule" 100: #uname eq alfa \ > rule $id="lo_gr_freeswitch-rule-0" -25000: not_defined pingd or pingd > lte 1000 \ > rule $id="lo_gr_freeswitch-rule-1" pingd: defined pingd > location ms_logAlfa__on__alfa ms_drbd_logAlfa \ > rule $id="ms_logAlfa__on__alfa-rule" $role="master" 2000: #uname eq > alfa > location ms_logBeta__on__beta ms_drbd_logBeta \ > rule $id="ms_logBeta__on__beta-rule" $role="master" 2000: #uname eq > beta > colocation freeswitch_on_drbd inf: gr_freeswitch > ms_drbd_freeswitch:Master > colocation fs_logAlfa__on__drbd_logAlfa inf: fs_drbd_logAlfa > ms_drbd_logAlfa:Master > colocation fs_logBeta__on__drbd_logBeta inf: fs_drbd_logBeta > ms_drbd_logBeta:Master > order freeswitch_after_drbd inf: ms_drbd_freeswitch:promote > gr_freeswitch:start > order fs_logAlfa__after__drbd_logAlfa inf: ms_drbd_logAlfa:promote > fs_drbd_logAlfa:start > order fs_logBeta__after__drbd_logBeta inf: ms_drbd_logBeta:promote > fs_drbd_logBeta:start > property $id="cib-bootstrap-options" \ > stonith-enabled="false" \ > default-resource-stickiness="1" \ > no-quorum-policy="ignore" \ > dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" > rsc_defaults $id="rsc-options" \ > resource-stickiness="1" > > ------------------------------------------------------------------------ > Log when both nodes have cable connected > ------------------------------------------------------------------------ > crm_mon -A1 > ============ > Last updated: Mon May 9 11:24:18 2011 > Stack: openais > Current DC: alfa - partition with quorum > Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f > 2 Nodes configured, 2 expected votes > 7 Resources configured. > ============ > > Online: [ alfa beta ] > > Resource Group: gr_freeswitch > fs_drbd_freeswitch (ocf::heartbeat:Filesystem): Started alfa > ClusterIP (ocf::heartbeat:IPaddr2): Started alfa > freeswitch (lsb:freeswitch): Started alfa > resMON (ocf::pacemaker:ClusterMon): Started alfa > WebSite (ocf::heartbeat:apache): Started alfa > Master/Slave Set: ms_drbd_logAlfa [drbd_logAlfa] > Masters: [ alfa ] > Slaves: [ beta ] > Master/Slave Set: ms_drbd_logBeta [drbd_logBeta] > Masters: [ beta ] > Slaves: [ alfa ] > fs_drbd_logAlfa (ocf::heartbeat:Filesystem): Started alfa > fs_drbd_logBeta (ocf::heartbeat:Filesystem): Started beta > Master/Slave Set: ms_drbd_freeswitch [drbd_freeswitch] > Masters: [ alfa ] > Slaves: [ beta ] > Clone Set: pingdClone [pingd] > Started: [ alfa beta ] > > Node Attributes: > * Node alfa: > + master-drbd_freeswitch:0 : 10000 > + master-drbd_logAlfa:0 : 10000 > + master-drbd_logBeta:0 : 10000 > + pingd : 4000 > * Node beta: > + master-drbd_freeswitch:1 : 10000 > + master-drbd_logAlfa:1 : 10000 > + master-drbd_logBeta:1 : 10000 > + pingd : 4000 > > --------------------------------------------------------------- > Log when only "beta" has the cable connected > --------------------------------------------------------------- > Last updated: Mon May 9 11:27:09 2011 > Stack: openais > Current DC: beta - partition WITHOUT quorum > Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f > 2 Nodes configured, 2 expected votes > 7 Resources configured. > ============ > > Online: [ beta ] > OFFLINE: [ alfa ] > > Master/Slave Set: ms_drbd_logAlfa [drbd_logAlfa] > Slaves: [ beta ] > Stopped: [ drbd_logAlfa:1 ] > Master/Slave Set: ms_drbd_logBeta [drbd_logBeta] > Masters: [ beta ] > Stopped: [ drbd_logBeta:0 ] > fs_drbd_logBeta (ocf::heartbeat:Filesystem): Started beta > Master/Slave Set: ms_drbd_freeswitch [drbd_freeswitch] > Slaves: [ beta ] > Stopped: [ drbd_freeswitch:1 ] > Clone Set: pingdClone [pingd] > Started: [ beta ] > Stopped: [ pingd:0 ] > > Node Attributes: > * Node beta: > + master-drbd_freeswitch:0 : 10000 > + master-drbd_logAlfa:0 : 10000 > + master-drbd_logBeta:1 : 10000 > + pingd : 3000 : Connectivity is > degraded (Expected=4000) > > Failed actions: > drbd_logAlfa:0_promote_0 (node=beta, call=1324, rc=-2, status=Timed > Out): unknown exec error > drbd_freeswitch:0_promote_0 (node=beta, call=1331, rc=-2, > status=Timed Out): unknown exec error > drbd_logAlfa:1_promote_0 (node=beta, call=1354, rc=-2, status=Timed > Out): unknown exec error > drbd_freeswitch:1_promote_0 (node=beta, call=1355, rc=-2, > status=Timed Out): unknown exec error > > > :: SYSNET TELEMATICA srl :: > CONFIDENZIALE: > Questo messaggio e gli eventuali allegati sono confidenziali e riservati. > Se vi è stato recapitato per errore e non siete fra i destinatari elencati, > siete pregati di darne immediatamente avviso al mittente e cancellare il > messaggio > di posta e gli eventuali file allegati. Le informazioni contenute non devono > essere mostrate ad altri, né utilizzate, memorizzate o copiate in qualsiasi > forma. > > CONFIDENTIALITY : > This e-mail and any attachments are confidential and may be privileged. > If you are not a named recipient, please notify the sender immediately and > delete > this e-mail and any attachment. Do not disclose the contents to another > person, > use it for any purpose or store or copy the information in any medium. > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems Hi,
might be the line. order freeswitch_after_drbd inf: ms_drbd_freeswitch:promote gr_freeswitch:start I would delete it and make the service order on something else. I had a similar problem following a manual I found on the internet. The Nodes were in loop always trying to start the service. Bye Mario _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
