Ok.. so there something I probably don't understand.. each node should have right privilege ( if we are talking about filesystem permission ) because when one of my two node failed(node1), the ressource(vm1) can start on the available node..(node2) the problem happend when the failed node comeback online(node1).. the ressoruce(vm1) is supposed to shutdown on node2 and restart on the node1 isn'it ?
We already try this setup with a SLES 32bits everything was working.. I just want to know where the problem can be.. is it my configuration ? it's supposed to be exactly the same as my old setup.. is it the 64bits version of SLES ? when I set in the default config: symetric cluster = yes default ressource stickiness = INFINITY and I add a place constraints, score INFINITY,expression #uname eq node1 my ressource is not supposed to go back to his original node ?? like if I set auto_failback option in heartbeat V1 ?? I'm sorry if my previous post was not clear.. On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote: > On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote: > > > > > > > > On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > > > > grep ERROR logfile > > > > > > try this for starters: > > > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > (resource_qclvmsles02:stop:stderr) Error: the domain > > > 'resource_qclvmsles02' > > > does not exist. > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02 > > > terminated > > > May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event: lrm.cLRM > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4) insufficient > > > privileges > > > > > > yup I saw that.. it's weird. Heartbeat shutdown the vm, then say these > > errors.. and if I cleanup the ressource he restart on the correct node.. > > There should be something I missed lol > > > > > > On 5/7/07, Rene Purcell <[EMAIL PROTECTED] > wrote: > > > > I would like to know if someone had tried the Novell setup described > > > in " > > > > http://www.novell.com/linux/technical_library/has.pdf " with a x86_64 > > > arch ? > > > > > > > > I've tested this setup with a classic x86 arch and everything was > > > ok... but > > > > I doublechecked my config and everything look good but my VM never > > > start on > > > > his original node when it come back online... and I can't find why! > > > > > > > > > > > > here's the log when my node1 come back.. we can see the VM shutting > > > down and > > > > after that nothing happend in the other node.. > > > > > > > > May 7 16:31:25 qclsles01 cib: [22024]: info: > > > > cib_diff_notify:notify.cUpdate (client: 6403, call:13): > > > > 0.65.1020 -> 0.65.1021 (ok) > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > te_update_diff:callbacks.cProcessing diff (cib_update): > > > > 0.65.1020 -> 0.65.1021 > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > > extract_event:events.cAborting on transient_attributes changes > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > update_abort_priority: > > > > utils.c Abort priority upgraded to 1000000 > > > > May 7 16:31:25 qclsles01 tengine: [22591]: info: > > > update_abort_priority: > > > > utils.c Abort action 0 superceeded by 2 > > > > May 7 16:31:26 qclsles01 cib: [22024]: info: activateCibXml: io.cCIB > > > size > > > > is 161648 bytes (was 158548) > > > > May 7 16:31:26 qclsles01 cib: [22024]: info: > > > > cib_diff_notify:notify.cUpdate (client: 6403, call:14): > > > > 0.65.1021 -> 0.65.1022 (ok) > > > > May 7 16:31:26 qclsles01 haclient: on_event:evt:cib_changed > > > > May 7 16:31:26 qclsles01 tengine: [22591]: info: > > > > te_update_diff:callbacks.cProcessing diff (cib_update): > > > > 0.65.1021 -> 0.65.1022 > > > > May 7 16:31:26 qclsles01 tengine: [22591]: info: > > > > match_graph_event: events.cAction resource_qclvmsles02_stop_0 (9) > > > > confirmed > > > > May 7 16:31:26 qclsles01 cib: [25889]: info: write_cib_contents: io.cWrote > > > > version 0.65.1022 of the CIB to disk (digest: > > > > e71c271759371d44c4bad24d50b2421d) > > > > May 7 16:31:39 qclsles01 kernel: xenbr0: port 3(vif12.0) entering > > > disabled > > > > state > > > > May 7 16:31:39 qclsles01 kernel: device vif12.0 left promiscuous mode > > > > May 7 16:31:39 qclsles01 kernel: xenbr0: port 3( vif12.0) entering > > > disabled > > > > state > > > > May 7 16:31:39 qclsles01 logger: /etc/xen/scripts/vif-bridge: offline > > > > XENBUS_PATH=backend/vif/12/0 > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > XENBUS_PATH=backend/vbd/12/768 > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > XENBUS_PATH=backend/vbd/12/832 > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove > > > > XENBUS_PATH=backend/vbd/12/5632 > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: brctl > > > delif > > > > xenbr0 vif12.0 failed > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > > ifconfig > > > > vif12.0 down failed > > > > May 7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge: > > > Successful > > > > vif-bridge offline for vif12.0, bridge xenbr0. > > > > May 7 16:31:40 qclsles01 logger: > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > XENBUS_PATH=backend/vbd/12/5632 > > > > May 7 16:31:40 qclsles01 logger: > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > XENBUS_PATH=backend/vbd/12/768 > > > > May 7 16:31:40 qclsles01 ifdown: vif12.0 > > > > May 7 16:31:40 qclsles01 logger: > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > XENBUS_PATH=backend/vif/12/0 > > > > May 7 16:31:40 qclsles01 logger: > > > /etc/xen/scripts/xen-hotplug-cleanup: > > > > XENBUS_PATH=backend/vbd/12/832 > > > > May 7 16:31:40 qclsles01 ifdown: Interface not available and no > > > > configuration found. > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > (resource_qclvmsles02:stop:stderr) Error: the domain > > > 'resource_qclvmsles02' > > > > does not exist. > > > > May 7 16:31:41 qclsles01 lrmd: [5020]: info: RA output: > > > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02 > > > terminated > > > > May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event: lrm.cLRM > > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4) insufficient > > > > privileges > > > > May 7 16:31:41 qclsles01 cib: [22024]: info: activateCibXml:io.cCIB > > > size > > > > is 164748 bytes (was 161648) > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > do_state_transition: fsa.cqclsles01: State transition > > > > S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ > > > > input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > te_update_diff: callbacks.cProcessing diff (cib_update): > > > > 0.65.1022 -> 0.65.1023 > > > > May 7 16:31:41 qclsles01 cib: [22024]: info: > > > > cib_diff_notify:notify.cUpdate (client: 22028, call:100): > > > > 0.65.1022 -> 0.65.1023 (ok) > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: do_state_transition: > > > fsa.c All > > > > 2 cluster nodes are eligable to run resources. > > > > May 7 16:31:41 qclsles01 tengine: [22591]: ERROR: match_graph_event: > > > > events.c Action resource_qclvmsles02_stop_0 on qclsles01 failed > > > (target: 0 > > > > vs. rc: 4): Error > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > match_graph_event:events.cAction resource_qclvmsles02_stop_0 (10) > > > > confirmed > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > run_graph: graph.c==================================================== > > > > May 7 16:31:41 qclsles01 tengine: [22591]: notice: > > > > run_graph: graph.cTransition 12: (Complete=3, Pending=0, Fired=0, > > > > Skipped=2, Incomplete=0) > > > > May 7 16:31:41 qclsles01 haclient: on_event:evt:cib_changed > > > > May 7 16:31:41 qclsles01 cib: [26190]: info: write_cib_contents: io.cWrote > > > > version 0.65.1023 of the CIB to disk (digest: > > > > c80326e44b5a106fe9a384240c4a3cc9) > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: process_pe_message: > > > > [generation] <cib generated="true" admin_epoch="0" have_quorum="true" > > > > num_peers="2" cib_feature_revision="1.3" ccm_transition="10" > > > > dc_uuid="46ef9c7b-5f6e-4cc0-a0bb-94227b605170" epoch="65" > > > > num_updates="1023"/> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config: > > > unpack.c No > > > > value specified for cluster preference: default_action_timeout > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config: unpack.cDefault stickiness: 1000000 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config:unpack.cDefault failure stickiness: -500 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config: unpack.cSTONITH of failed nodes is disabled > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config:unpack.cSTONITH will reboot nodes > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config: unpack.cCluster is symmetric - resources can run > > > > anywhere by default > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config: > > > unpack.c On > > > > loss of CCM Quorum: Stop ALL resources > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config:unpack.cOrphan resources are stopped > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config:unpack.cOrphan resource actions are stopped > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config: > > > unpack.c No > > > > value specified for cluster preference: remove_after_stop > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > > unpack_config:unpack.cStopped resources are removed from the status > > > > section: false > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config: > > > unpack.c By > > > > default resources are managed > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > determine_online_status: > > > > unpack.c Node qclsles02 is online > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > determine_online_status: > > > > unpack.c Node qclsles01 is online > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > unpack_rsc_op: unpack.cProcessing failed op > > > > (resource_qclvmsles02_stop_0) for resource_qclvmsles02 > > > > on qclsles01 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > unpack_rsc_op:unpack.cHandling failed stop for resource_qclvmsles02 on > > > > > > > qclsles01 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > process_orphan_resource: > > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs" class="lsb" > > > > provider="heartbeat"> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > process_orphan_resource: > > > > Orphan resource <lrm_rsc_op id="resource_NFS_monitor_0" > > > > operation="monitor" crm-debug-origin="build_active_RAs" > > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > call_id="9" > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > process_orphan_resource: > > > > Orphan resource <lrm_rsc_op id="resource_NFS_stop_0" > > > operation="stop" > > > > crm-debug-origin="build_active_RAs" > > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > call_id="10" > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > process_orphan_resource: > > > > Orphan resource </lrm_resource> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > process_orphan_resource: > > > > unpack.c Nothing known about resource resource_NFS running on > > > qclsles01 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > create_fake_resource: > > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs" class="lsb" > > > > provider="heartbeat"> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > create_fake_resource: > > > > Orphan resource <lrm_rsc_op id="resource_NFS_monitor_0" > > > > operation="monitor" crm-debug-origin="build_active_RAs" > > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > call_id="9" > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > create_fake_resource: > > > > Orphan resource <lrm_rsc_op id="resource_NFS_stop_0" > > > operation="stop" > > > > crm-debug-origin="build_active_RAs" > > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085" > > > call_id="10" > > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0" > > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > create_fake_resource: > > > > Orphan resource </lrm_resource> > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: > > > process_orphan_resource: > > > > unpack.c Making sure orphan resource_NFS is stopped > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: resource_qclvmsles01 > > > > (heartbeat::ocf:Xen): Started qclsles01 > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: resource_qclvmsles02 > > > > > > > (heartbeat::ocf:Xen): Started qclsles01 (unmanaged) FAILED > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: resource_NFS > > > > (lsb:nfs): Stopped > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > NoRoleChange:native.cLeave resource resource_qclvmsles01 > > > > (qclsles01) > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > NoRoleChange:native.cMove resource resource_qclvmsles02 (qclsles01 > > > > > > > -> qclsles02) > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > do_state_transition:fsa.cqclsles01: State transition S_POLICY_ENGINE > > > > -> S_TRANSITION_ENGINE [ > > > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > custom_action:utils.cAction resource_qclvmsles02_stop_0 stop is for > > > > resource_qclvmsles02 > > > > (unmanaged) > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: > > > > custom_action:utils.cAction resource_qclvmsles02_start_0 start is for > > > > resource_qclvmsles02 > > > > (unmanaged) > > > > May 7 16:31:41 qclsles01 pengine: [22592]: notice: > > > > stage8:allocate.cCreated transition graph 13. > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > unpack_graph:unpack.cUnpacked transition 13: 0 actions in 0 synapses > > > > May 7 16:31:41 qclsles01 crmd: [22028]: info: > > > > do_state_transition:fsa.cqclsles01 : State transition > > > > S_TRANSITION_ENGINE -> S_IDLE [ > > > > input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] > > > > May 7 16:31:41 qclsles01 pengine: [22592]: WARN: process_pe_message: > > > > pengine.c No value specified for cluster preference: > > > pe-input-series-max > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > run_graph:graph.cTransition 13: (Complete=0, Pending=0, Fired=0, > > > > Skipped=0, Incomplete=0) > > > > May 7 16:31:41 qclsles01 pengine: [22592]: info: process_pe_message: > > > > pengine.c Transition 13: PEngine Input stored in: > > > > /var/lib/heartbeat/pengine/pe-input-100.bz2 > > > > May 7 16:31:41 qclsles01 tengine: [22591]: info: > > > > notify_crmd:actions.cTransition 13 status: te_complete - (null) > > > > > > > > > > > > Thanks! > > > > > > > > > > > > -- > > > > René Jr Purcell > > > > Chargé de projet, sécurité et sytèmes > > > > Techno Centre Logiciels Libres, http://www.tc2l.ca/ > > > > Téléphone : (418) 681-2929 #124 > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > -- > > René Jr Purcell > > Chargé de projet, sécurité et sytèmes > > Techno Centre Logiciels Libres, http://www.tc2l.ca/ > > Téléphone : (418) 681-2929 #124 > > > > ah and how am I supposed to know which node is concerned int he log ? > I can read: > > "May 7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event:lrm.cLRM > operation (35) stop_0 on resource_qclvmsles02 Error: (4) insufficient > privileges" > > on my first node and the same message except for the hostname in my second > node.. so which one have a privileges problem ? both _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
-- René Jr Purcell Chargé de projet, sécurité et sytèmes Techno Centre Logiciels Libres, http://www.tc2l.ca/ Téléphone : (418) 681-2929 #124 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
