Hi, On Tue, Sep 22, 2009 at 12:06:48PM +0200, Urs Weiss wrote: > Hello > > I'm trying to get this working since two days now, but ldirectord > somehow does not work. Had no problem with it on older Heartbeat 2. Hope > you can give me a hint. > > > My setup: > - CentOS 5.3 > - HA packages from > "http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_ > $releasever/": > - heartbeat-3.0.0-33.2 > - openais-0.80.5-15.1 > - libopenais2-0.80.5-15.1 > - pacemaker-1.0.5-4.1 > - pacemaker-libs-1.0.5-4.1 > > > The goal: > - ldirectord with failover to second node > > > The current config looks like this: > ==================================== > crm(live)# configure show > node ovz01.icrcom.ch > node ovz04.icrcom.ch > primitive failover-ip ocf:heartbeat:IPaddr \ > params ip="172.30.101.110" nic="eth0" netmask="24" > broadcast="172.30.101.255" \ > op monitor interval="5s" timeout="15s" > primitive ldirectord_1 ocf:heartbeat:ldirectord \ > params 1="ldirectord.cf" target_role="started" \ > op monitor interval="120s" role="Started" timeout="60s" start_delay="0" > disabled="false" > property $id="cib-bootstrap-options" \ > dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \ > cluster-infrastructure="Heartbeat" \ > symetric-cluster="true" \ > stonith-enabled="false" \ > no-quorum-policy="stop" \ > default-resource-stickiness="0" \ > default-resource-failure-stickiness="0" \ > stop-orphan-actions="true" \ > stop-orphan-resources="true" \ > remove-after-stop="false" \ > short-resource-names="true" \ > transition-idle-timeout="5min" \ > default-action-timeout="15s" \ > is-managed-default="true" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1253609925" > ==================================== > > > The IP looks good, but not ldirectord: > ==================================== > # crm_mon --one-shot > > ============ > Last updated: Tue Sep 22 11:57:06 2009 > Stack: openais > Current DC: ovz04.icrcom.ch - partition with quorum > Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 > 2 Nodes configured, 2 expected votes > 2 Resources configured. > ============ > > Online: [ ovz04.icrcom.ch ovz01.icrcom.ch ] > > failover-ip (ocf::heartbeat:IPaddr): Started ovz04.icrcom.ch > ldirectord_1 (ocf::heartbeat:ldirectord) Started [ ovz04.icrcom.ch > ovz01.icrcom.ch ] > > Failed actions: > ldirectord_1_monitor_0 (node=ovz04.icrcom.ch, call=3, rc=1, > status=complete): unknown error > ldirectord_1_stop_0 (node=ovz04.icrcom.ch, call=4, rc=1, > status=complete): unknown error > ldirectord_1_monitor_0 (node=ovz01.icrcom.ch, call=3, rc=1, > status=complete): unknown error > ==================================== > > > >From the logs: > ==================================== > Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status: > Node ovz04.icrcom.ch is online > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op: > ldirectord_1_monitor_0 on ovz04.icrcom.ch returned 1 (unknown error) > instead of the expected value: 7 (not running) > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing > failed op ldirectord_1_monitor_0 on ovz04.icrcom.ch: unknown error > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op: > ldirectord_1_stop_0 on ovz04.icrcom.ch returned 1 (unknown error) > instead of the expected value: 0 (ok) > Sep 22 11:56:40 ovz04 crmd: [12686]: info: process_lrm_event: LRM > operation failover-ip_start_0 (call=5, rc=0, cib-update=60, > confirmed=true) complete ok > Sep 22 11:56:40 ovz04 crmd: [12686]: info: match_graph_event: Action > failover-ip_start_0 (6) confirmed on ovz04.icrcom.ch (rc=0) > Sep 22 11:56:40 ovz04 crmd: [12686]: info: run_graph: > ==================================================== > Sep 22 11:56:40 ovz04 crmd: [12686]: notice: run_graph: Transition 6 > (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0, > Source=/var/lib/pengine/pe-warn-336.bz2): Stopped > Sep 22 11:56:40 ovz04 crmd: [12686]: info: te_graph_trigger: Transition > 6 is now complete > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing > failed op ldirectord_1_stop_0 on ovz04.icrcom.ch: unknown error > Sep 22 11:56:40 ovz04 pengine: [12685]: info: native_add_running: > resource ldirectord_1 isnt managed > Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status: > Node ovz01.icrcom.ch is online > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op: > ldirectord_1_monitor_0 on ovz01.icrcom.ch returned 1 (unknown error) > instead of the expected value: 7 (not running) > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing > failed op ldirectord_1_monitor_0 on ovz01.icrcom.ch: unknown error > ====================================
Look for 'lrmd.*ldirector' on all nodes where it failed. That should show you what's happening with the resource. Thanks, Dejan > > > Thank you > Urs > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
