Hello, Already saw some similar problems on the list and elsewhere in the net. But there was never a real solution for my problem. Don't think i'm the only one who has/had this problem.
I have two server (CentOS 5). Both running heartbeat and ldirectord. On the same servers there are a SMTP servers running which should be load balanced by ldirector. Over all it works very fine. But after restarting the real server heartbeat sets my lo:0 interface down which is needed to allow connections to the real server. (lo:0 is configured with the VIP of the cluster). If i set the interface back up manually again everything works fine. Here are my configurations: ======================================= ha.cf ======================================= logfacility local3 crm on keepalive 2 deadtime 20 warntime 10 initdead 60 udpport 694 mcast eth1 239.0.0.1 694 1 0 mcast eth2 239.0.0.2 694 1 0 node sgw01.censor.ed node sgw02.censor.ed ======================================= ldirectord.cf ======================================= checktimeout=10 checkinterval=2 autoreload=yes logfile="local0" quiescent=yes # Virtual Service for SMTP virtual=172.30.101.100:25 real=172.30.101.101:25 gate 100 real=172.30.101.102:25 gate 105 service=smtp scheduler=wrr protocol=tcp checktype=negotiate checkport=25 ======================================= cib.xml (converted from my old V1 config) ======================================= <cib admin_epoch="0" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" cib_feature_revision="1.3" ccm_transition="4" dc_uuid="2c74e6bd-41e7-416d-b598-14d8592b9cab" epoch="11" num_updates="1" cib-last-written="Wed Nov 14 13:19:37 2007"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <attributes> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="stop"/> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/> <nvpair id="cib-bootstrap-options-stop-orphan-resources" name="stop-orphan-resources" value="true"/> <nvpair id="cib-bootstrap-options-stop-orphan-actions" name="stop-orphan-actions" value="true"/> <nvpair id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" value="false"/> <nvpair id="cib-bootstrap-options-short-resource-names" name="short-resource-names" value="true"/> <nvpair id="cib-bootstrap-options-transition-idle-timeout" name="transition-idle-timeout" value="5min"/> <nvpair id="cib-bootstrap-options-default-action-timeout" name="default-action-timeout" value="15s"/> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/> </attributes> </cluster_property_set> </crm_config> <nodes> <node id="ba53fc53-3830-40f1-af7f-e87456c85c16" uname="sgw01.censor.ed" type="normal"> <instance_attributes id="nodes-ba53fc53-3830-40f1-af7f-e87456c85c16"> <attributes> <nvpair id="standby-ba53fc53-3830-40f1-af7f-e87456c85c16" name="standby" value="off"/> </attributes> </instance_attributes> </node> <node id="2c74e6bd-41e7-416d-b598-14d8592b9cab" uname="sgw02.censor.ed" type="normal"/> </nodes> <resources> <group id="group_1"> <primitive class="heartbeat" id="ldirectord_1" provider="heartbeat" type="ldirectord"> <operations> <op id="ldirectord_1_mon" interval="120s" name="monitor" timeout="60s" start_delay="0" disabled="false" role="Started"/> </operations> <instance_attributes id="ldirectord_1_inst_attr"> <attributes> <nvpair id="ldirectord_1_attr_1" name="1" value="ldirectord.cf"/> <nvpair id="4707b7c0-7dbd-41a2-a812-ab98206b2508" name="target_role" value="started"/> </attributes> </instance_attributes> </primitive> <primitive class="heartbeat" id="LVSSyncDaemonSwap_2" provider="heartbeat" type="LVSSyncDaemonSwap"> <operations> <op id="LVSSyncDaemonSwap_2_mon" interval="120s" name="monitor" timeout="60s"/> </operations> <instance_attributes id="LVSSyncDaemonSwap_2_inst_attr"> <attributes> <nvpair id="LVSSyncDaemonSwap_2_attr_1" name="1" value="master"/> </attributes> </instance_attributes> </primitive> <primitive class="ocf" provider="heartbeat" id="IPaddr_172_30_101_100" type="IPaddr"> <operations> <op interval="5s" name="monitor" id="IPaddr_172_30_101_100_mon" timeout="10s"/> </operations> <instance_attributes id="IPaddr_172_30_101_100_inst_attr"> <attributes> <nvpair id="IPaddr_172_30_101_100_attr_0" name="ip" value="172.30.101.100"/> <nvpair id="IPaddr_172_30_101_100_attr_1" name="netmask" value="24"/> <nvpair id="IPaddr_172_30_101_100_attr_2" name="nic" value="eth0"/> <nvpair id="IPaddr_172_30_101_100_attr_3" name="broadcast" value="172.30.101.255"/> </attributes> </instance_attributes> </primitive> </group> </resources> <constraints> <rsc_location id="rsc_location_group_1" rsc="group_1"> <rule id="prefered_location_group_1" score="100"> <expression attribute="#uname" id="prefered_location_group_1_expr" operation="eq" value="sgw01.censor.ed"/> </rule> </rsc_location> </constraints> </configuration> </cib> ======================================= Output from crm_mon ======================================= ============ Last updated: Wed Nov 14 13:59:38 2007 Current DC: sgw02.censor.ed (2c74e6bd-41e7-416d-b598-14d8592b9cab) 2 Nodes configured. 1 Resources configured. ============ Node: sgw01.censor.ed (ba53fc53-3830-40f1-af7f-e87456c85c16): online Node: sgw02.censor.ed (2c74e6bd-41e7-416d-b598-14d8592b9cab): online Resource Group: group_1 ldirectord_1 (heartbeat:ldirectord): Started sgw01.censor.ed FAILED LVSSyncDaemonSwap_2 (heartbeat:LVSSyncDaemonSwap): Started sgw01.icrcom.ch IPaddr_172_30_101_100 (heartbeat::ocf:IPaddr): Started sgw01.censor.ed Failed actions: ldirectord_1_monitor_120000 (node=sgw01.censor.ed, call=17, rc=7): complete ======================================= And here some log messages: ======================================= Currently active ldirector: =========================== Nov 14 13:16:48 sgw01 lrmd: [18224]: info: RA output: (ldirectord_1:monitor:stderr) ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 18281 Nov 14 13:16:48 sgw01 lrmd: [18224]: WARN: There is something wrong: the first line isn't read in. Maybe the heartbeat does not ouput string correctly for status operat ion. Or the code (myself) is wrong. Nov 14 13:16:48 sgw01 lrmd: [18224]: debug: RA output [] didn't match any pattern (Don't know what the reason can be. The config looks fine....) Nov 14 13:19:44 sgw01 lrmd: [18224]: CRIT: read_pipe:3480 Attempt to read from closed file descriptor 10. Real server: ============ Nov 14 11:34:01 sgw02 IPaddr[3032]: ERROR: 172.30.101.100 is running an interface (lo) instead of the configured one (eth0) Nov 14 11:34:01 sgw02 crmd: [2826]: ERROR: process_lrm_event: LRM operation IPaddr_172_30_101_100_monitor_0 (call=4, rc=1) Error unknown error One idea i had was to define a resource for the lo:0 interface which is running on all servers where the resource of the VIP is not running. But don't know how to do that. An that may is not the clean way. So hopefully there is somewhere outside which can help me. Thank you very much Urs _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems