Hi,

On Tue, Sep 22, 2009 at 12:06:48PM +0200, Urs Weiss wrote:
> Hello
> 
> I'm trying to get this working since two days now, but ldirectord
> somehow does not work. Had no problem with it on older Heartbeat 2. Hope
> you can give me a hint.
> 
> 
> My setup:
> - CentOS 5.3
> - HA packages from
> "http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_
> $releasever/":
>   - heartbeat-3.0.0-33.2
>   - openais-0.80.5-15.1
>   - libopenais2-0.80.5-15.1
>   - pacemaker-1.0.5-4.1
>   - pacemaker-libs-1.0.5-4.1
> 
> 
> The goal:
> - ldirectord with failover to second node
> 
> 
> The current config looks like this:
> ====================================
> crm(live)# configure show
> node ovz01.icrcom.ch
> node ovz04.icrcom.ch
> primitive failover-ip ocf:heartbeat:IPaddr \
>       params ip="172.30.101.110" nic="eth0" netmask="24"
> broadcast="172.30.101.255" \
>       op monitor interval="5s" timeout="15s"
> primitive ldirectord_1 ocf:heartbeat:ldirectord \
>       params 1="ldirectord.cf" target_role="started" \
>       op monitor interval="120s" role="Started" timeout="60s" start_delay="0"
> disabled="false"
> property $id="cib-bootstrap-options" \
>       dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
>       cluster-infrastructure="Heartbeat" \
>       symetric-cluster="true" \
>       stonith-enabled="false" \
>       no-quorum-policy="stop" \
>       default-resource-stickiness="0" \
>       default-resource-failure-stickiness="0" \
>       stop-orphan-actions="true" \
>       stop-orphan-resources="true" \
>       remove-after-stop="false" \
>       short-resource-names="true" \
>       transition-idle-timeout="5min" \
>       default-action-timeout="15s" \
>       is-managed-default="true" \
>       expected-quorum-votes="2" \
>       last-lrm-refresh="1253609925"
> ====================================
> 
> 
> The IP looks good, but not ldirectord:
> ====================================
> # crm_mon --one-shot
> 
> ============
> Last updated: Tue Sep 22 11:57:06 2009
> Stack: openais
> Current DC: ovz04.icrcom.ch - partition with quorum
> Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
> 
> Online: [ ovz04.icrcom.ch ovz01.icrcom.ch ]
> 
> failover-ip   (ocf::heartbeat:IPaddr):        Started ovz04.icrcom.ch
> ldirectord_1  (ocf::heartbeat:ldirectord) Started [   ovz04.icrcom.ch
> ovz01.icrcom.ch ]
> 
> Failed actions:
>     ldirectord_1_monitor_0 (node=ovz04.icrcom.ch, call=3, rc=1,
> status=complete): unknown error
>     ldirectord_1_stop_0 (node=ovz04.icrcom.ch, call=4, rc=1,
> status=complete): unknown error
>     ldirectord_1_monitor_0 (node=ovz01.icrcom.ch, call=3, rc=1,
> status=complete): unknown error
> ====================================
> 
> 
> >From the logs:
> ====================================
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> Node ovz04.icrcom.ch is online
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_monitor_0 on ovz04.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 7 (not running)
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_monitor_0 on ovz04.icrcom.ch: unknown error
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_stop_0 on ovz04.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 0 (ok)
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: process_lrm_event: LRM
> operation failover-ip_start_0 (call=5, rc=0, cib-update=60,
> confirmed=true) complete ok
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: match_graph_event: Action
> failover-ip_start_0 (6) confirmed on ovz04.icrcom.ch (rc=0)
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: run_graph:
> ====================================================
> Sep 22 11:56:40 ovz04 crmd: [12686]: notice: run_graph: Transition 6
> (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pengine/pe-warn-336.bz2): Stopped
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: te_graph_trigger: Transition
> 6 is now complete
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_stop_0 on ovz04.icrcom.ch: unknown error
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: native_add_running:
> resource ldirectord_1 isnt managed
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> Node ovz01.icrcom.ch is online
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_monitor_0 on ovz01.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 7 (not running)
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_monitor_0 on ovz01.icrcom.ch: unknown error
> ====================================

Look for 'lrmd.*ldirector' on all nodes where it failed. That
should show you what's happening with the resource.

Thanks,

Dejan

> 
> 
> Thank you
> Urs
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to