[Linux-HA] Heartbeat disables lo:0 on real server at startup

Urs Wed, 14 Nov 2007 05:45:54 -0800

Hello,

Already saw some similar problems on the list and elsewhere in the net.
But there was never a real solution for my problem. Don't think i'm the
only one who has/had this problem.


I have two server (CentOS 5). Both running heartbeat and ldirectord. On
the same servers there are a SMTP servers running which should be load
balanced by ldirector.

Over all it works very fine. But after restarting the real server
heartbeat sets my lo:0 interface down which is needed to allow
connections to the real server. (lo:0 is configured with the VIP of the
cluster). If i set the interface back up manually again everything works
fine.

Here are my configurations:

=======================================
ha.cf
=======================================
logfacility     local3
crm on
keepalive 2
deadtime 20
warntime 10
initdead 60
udpport 694
mcast eth1 239.0.0.1 694 1 0
mcast eth2 239.0.0.2 694 1 0
node    sgw01.censor.ed
node    sgw02.censor.ed


=======================================
ldirectord.cf
=======================================
checktimeout=10
checkinterval=2
autoreload=yes
logfile="local0"
quiescent=yes

# Virtual Service for SMTP
virtual=172.30.101.100:25
        real=172.30.101.101:25 gate 100
        real=172.30.101.102:25 gate 105
        service=smtp
        scheduler=wrr
        protocol=tcp
        checktype=negotiate
        checkport=25


=======================================
cib.xml (converted from my old V1 config)
=======================================
 <cib admin_epoch="0" generated="true" have_quorum="true"
ignore_dtd="false" num_peers="2" cib_feature_revision="1.3"
ccm_transition="4" dc_uuid="2c74e6bd-41e7-416d-b598-14d8592b9cab"
epoch="11" num_updates="1" cib-last-written="Wed Nov 14 13:19:37 2007">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>
           <nvpair id="cib-bootstrap-options-no-quorum-policy"
name="no-quorum-policy" value="stop"/>
           <nvpair
id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="0"/>
           <nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="0"/>
           <nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
           <nvpair id="cib-bootstrap-options-stonith-action"
name="stonith-action" value="reboot"/>
           <nvpair id="cib-bootstrap-options-stop-orphan-resources"
name="stop-orphan-resources" value="true"/>
           <nvpair id="cib-bootstrap-options-stop-orphan-actions"
name="stop-orphan-actions" value="true"/>
           <nvpair id="cib-bootstrap-options-remove-after-stop"
name="remove-after-stop" value="false"/>
           <nvpair id="cib-bootstrap-options-short-resource-names"
name="short-resource-names" value="true"/>
           <nvpair id="cib-bootstrap-options-transition-idle-timeout"
name="transition-idle-timeout" value="5min"/>
           <nvpair id="cib-bootstrap-options-default-action-timeout"
name="default-action-timeout" value="15s"/>
           <nvpair id="cib-bootstrap-options-is-managed-default"
name="is-managed-default" value="true"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node id="ba53fc53-3830-40f1-af7f-e87456c85c16"
uname="sgw01.censor.ed" type="normal">
         <instance_attributes
id="nodes-ba53fc53-3830-40f1-af7f-e87456c85c16">
           <attributes>
             <nvpair id="standby-ba53fc53-3830-40f1-af7f-e87456c85c16"
name="standby" value="off"/>
           </attributes>
         </instance_attributes>
       </node>
       <node id="2c74e6bd-41e7-416d-b598-14d8592b9cab"
uname="sgw02.censor.ed" type="normal"/>
     </nodes>
     <resources>
       <group id="group_1">
         <primitive class="heartbeat" id="ldirectord_1"
provider="heartbeat" type="ldirectord">
           <operations>
             <op id="ldirectord_1_mon" interval="120s" name="monitor"
timeout="60s" start_delay="0" disabled="false" role="Started"/>
           </operations>
           <instance_attributes id="ldirectord_1_inst_attr">
             <attributes>
               <nvpair id="ldirectord_1_attr_1" name="1"
value="ldirectord.cf"/>
               <nvpair id="4707b7c0-7dbd-41a2-a812-ab98206b2508"
name="target_role" value="started"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="heartbeat" id="LVSSyncDaemonSwap_2"
provider="heartbeat" type="LVSSyncDaemonSwap">
           <operations>
             <op id="LVSSyncDaemonSwap_2_mon" interval="120s"
name="monitor" timeout="60s"/>
           </operations>
           <instance_attributes id="LVSSyncDaemonSwap_2_inst_attr">
             <attributes>
               <nvpair id="LVSSyncDaemonSwap_2_attr_1" name="1"
value="master"/>
             </attributes>
           </instance_attributes>
         </primitive>
         <primitive class="ocf" provider="heartbeat"
id="IPaddr_172_30_101_100" type="IPaddr">
           <operations>
             <op interval="5s" name="monitor"
id="IPaddr_172_30_101_100_mon" timeout="10s"/>
           </operations>
           <instance_attributes id="IPaddr_172_30_101_100_inst_attr">
             <attributes>
               <nvpair id="IPaddr_172_30_101_100_attr_0" name="ip"
value="172.30.101.100"/>
               <nvpair id="IPaddr_172_30_101_100_attr_1" name="netmask"
value="24"/>
               <nvpair id="IPaddr_172_30_101_100_attr_2" name="nic"
value="eth0"/>
               <nvpair id="IPaddr_172_30_101_100_attr_3"
name="broadcast" value="172.30.101.255"/>
             </attributes>
           </instance_attributes>
         </primitive>
       </group>
     </resources>
     <constraints>
       <rsc_location id="rsc_location_group_1" rsc="group_1">
         <rule id="prefered_location_group_1" score="100">
           <expression attribute="#uname"
id="prefered_location_group_1_expr" operation="eq"
value="sgw01.censor.ed"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
 </cib>



=======================================
Output from crm_mon
=======================================
============
Last updated: Wed Nov 14 13:59:38 2007
Current DC: sgw02.censor.ed (2c74e6bd-41e7-416d-b598-14d8592b9cab)
2 Nodes configured.
1 Resources configured.
============

Node: sgw01.censor.ed (ba53fc53-3830-40f1-af7f-e87456c85c16): online
Node: sgw02.censor.ed (2c74e6bd-41e7-416d-b598-14d8592b9cab): online

Resource Group: group_1
    ldirectord_1        (heartbeat:ldirectord): Started sgw01.censor.ed
FAILED
    LVSSyncDaemonSwap_2 (heartbeat:LVSSyncDaemonSwap):  Started
sgw01.icrcom.ch
    IPaddr_172_30_101_100       (heartbeat::ocf:IPaddr):        Started
sgw01.censor.ed

Failed actions:
    ldirectord_1_monitor_120000 (node=sgw01.censor.ed, call=17, rc=7):
complete


=======================================
And here some log messages:
=======================================
Currently active ldirector:
===========================
Nov 14 13:16:48 sgw01 lrmd: [18224]: info: RA output:
(ldirectord_1:monitor:stderr) ldirectord for /etc/ha.d/ldirectord.cf is
running with pid: 18281
Nov 14 13:16:48 sgw01 lrmd: [18224]: WARN: There is something wrong: the
first line isn't read in. Maybe the heartbeat does not ouput string
correctly for status operat
ion. Or the code (myself) is wrong.
Nov 14 13:16:48 sgw01 lrmd: [18224]: debug: RA output [] didn't match
any pattern
(Don't know what the reason can be. The config looks fine....)

Nov 14 13:19:44 sgw01 lrmd: [18224]: CRIT: read_pipe:3480 Attempt to
read from closed file descriptor 10.


Real server:
============
Nov 14 11:34:01 sgw02 IPaddr[3032]: ERROR: 172.30.101.100 is running an
interface (lo) instead of the configured one (eth0)
Nov 14 11:34:01 sgw02 crmd: [2826]: ERROR: process_lrm_event: LRM
operation IPaddr_172_30_101_100_monitor_0 (call=4, rc=1) Error unknown
error


One idea i had was to define a resource for the lo:0 interface which is
running on all servers where the resource of the VIP is not running. But
don't know how to do that. An that may is not the clean way.

So hopefully there is somewhere outside which can help me.

Thank you very much
Urs

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Heartbeat disables lo:0 on real server at startup

Reply via email to